Integrated systems and methods for diversity generation and screening

ABSTRACT

Integrated systems and methods for diversity generation and screening are provided. The systems use common fluid and array handling components to provide nucleic acid diversification, transcription, translation, product screening and subsequent diversification reactions.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority to and benefit of priorUS provisional patent applications INTEGRATED SYSTEMS AND METHODS FORDIVERSITY GENERATION AND SCREENING by Bass et al. U.S. Ser. No.60/175,551 filed Jan. 11, 2000 and INTEGRATED SYSTEMS AND METHODS FORDIVERSITY GENERATION AND SCREENING by Bass et al. U.S. Ser. No.60/213,947 filed Jun. 23, 2000. The present application claims priorityto and benefit of these earlier applications pursuant to 35 U.S.C. §119and §120, as well as any other applicable statute or rule.

FIELD OF THE INVENTION

[0002] The present invention relates to automated devices and systemsfor performing nucleic acid recombination, mutation, shuffling and otherdiversity generating reactions in vitro, as well as related methods ofperforming automated diversity generation reactions. The devices andsystems can include, e.g., modules for generating diversity in nucleicacids, for recombining these nucleic acids, for arraying the nucleicacids, for making or copying arrays of reaction mixtures comprising thenucleic acids and for performing in vitro translation and/ortranscription of diverse libraries of nucleic acids. Related methods forperforming such shuffling reactions in vitro are also provided.

BACKGROUND OF THE INVENTION

[0003] Today's laboratory attempts to meet the dramatically increasingneed for analytical data brought about by the increased pace of newproduct development, increased research, demands for stricter qualitycontrol, and the like. Labs deliver data in a timely, cost-efficient waywhile ensuring precise results, clear documentation, and minimal use ofskilled (and, therefore, expensive) personnel. For example, automatedsystems have been proposed to assess a variety of biological phenomena,including, e.g., expression levels of genes in response to selectedstimuli (Service (1998) “Microchips Arrays Put DNA on the Spot” Science282: 396-399), high throughput DNA genotyping (Zhang et al. (1999)“Automated and Integrated System for High-Throughput DNA GenotypingDirectly from Blood” Anal. Chem. 71: 1138-1145) and many others.Similarly, integrated systems for performing mixing experiments, DNAamplification, DNA sequencing and the like are also available (See,e.g., Service (1998) “Coming Soon: the Pocket DNA Sequencer” Science282: 399-401).

[0004] Improvements in laboratory automation continually increase theproductivity of laboratory workers and provide for more precise results,clearer documentation and the like, as compared to the performance ofunautomated tasks. The automation of laboratory procedures using devicesand/or systems dedicated to particular tasks in the laboratorysubstantially enhances the speed and reproducibility of a variety ofexperimental tasks. Product research, regulatory approval and qualitycontrol in industries such as pharmaceuticals, chemicals, andbiotechnology routinely involve the testing of thousands (or evenhundreds of thousands) of samples.

[0005] Automated systems typically perform, e.g., repetitive fluidhandling operations (e.g., pipetting) for transferring material to orfrom reagent storage systems such as microtiter trays, which are used asbasic container elements for a variety of automated laboratory methods.Similarly, the systems manipulate, e.g., microtiter trays and control avariety of environmental conditions such as temperature, exposure tolight or air, and the like.

[0006] Many such automated systems are commercially available. Forexample, a variety of automated systems are available from the ZymarkCorporation (Zymark Center, Hopkinton, Mass.), which utilize variousZymate systems (see also, http://www.zymark.com/), which typicallyinclude, e.g., robotics and fluid handling modules. Similarly, thecommon ORCA® robot, which is used in a variety of laboratory systems,e.g., for microtiter tray manipulation, is also commercially available,e.g., from Beckman Coulter, Inc. (Fullerton, Calif.).

[0007] More recently, microfluidic systems have established thepotential for even greater automation and laboratory productivityincreases. In these microfluidic systems, automated fluid handling andother sample manipulations are controlled at the microscale level. Suchsystems are now commercially available. For example, the Hewlett-Packard(Agilent Technologies) HP2100 bioanalyzer utilizes LabChip™ technologyto manipulate extremely small sample volumes. In this “lab-on-a-chip,”system, sample preparation, fluid handling and biochemical analysissteps are carried out within the confines of a microchip. The chips havemicrochannels fabricated, e.g., in glass, providing interconnectednetworks of fluid reservoirs and pathways.

[0008] While many automated systems are now available, the applicationof automated systems to non-routine sample handling and analysis remainschallenging. In particular, the application of automation to newtechnologies in the field of molecular biology would be desirable. Forexample, some of the most significant new classes of techniques inmolecular biology are found in the field of rapid forced molecularevolution. In rapid evolution processes, diversity is generated innucleic acids of interest via mutation, recombination, or othermechanisms, which are screened for one or more desirable activities, orencoded activities. These processes are repeated until a nucleic acidpossessing or encoding a desired activity level is produced. The presentinvention provides significant new automated systems and methods whichfacilitate nucleic acid shuffling and other diversitygenerating/screening processes of interest.

SUMMARY OF THE INVENTION

[0009] The present invention provides automated devices for performingnucleic acid shuffling and other diversity generating reactions in vitroand in vivo. The devices can include, e.g., modules for generatingdiversity in nucleic acids, for recombining these nucleic acids, forarraying the nucleic acids, for making or copying arrays of reactionmixtures comprising shuffled mutated or otherwise diversified nucleicacids and for performing in vitro translation and/or transcription ofdiverse libraries of nucleic acids (including in an array-based format).Related methods for performing automated mutation, recombination and/orshuffling reactions in vitro and in vivo are also provided.

[0010] For example, the present invention comprises, e.g., devicesand/or integrated systems which include a physical or logical array ofreaction mixtures. The reaction mixtures include one or more diversified(e.g., shuffled or mutagenized) nucleic acids and/or one or moretranscribed shuffled or transcribed mutagenized nucleic acids and one ormore in vitro transcription and/or translation reagents. A variety ofvariant forms and implementations of these devices/integrated systems,as well as related methods are described herein.

[0011] The devices and integrated systems optionally include any of avariety of component or module elements. These can include, e.g., one ormore duplicates of the physical or logical array. A bar-code basedsample tracking module, which includes a bar code reader and a computerreadable database comprising at least one entry for at least one arrayor at least one array member can also be included, in which the entry iscorresponded to at least one bar code. The device or integrated systemcan include a long term storage device such as a refrigerator; anelectrically powered cooling device, a device capable of maintaining atemperature of <0 C., a freezer, a device which uses liquid nitrogen orliquid helium for cooling storing or freezing samples, a containercomprising wet or dry ice, a constant temperature and/or constanthumidity chamber or incubator; or an automated sample storage orretrieval unit. The device or integrated can also include one or moremodules for moving arrays or array members into the long term storagedevice.

[0012] The device or integrated system can, and often do, include a copyarray comprising a copy of each of a plurality of members of the one ormore shuffled or mutagenized nucleic acids in a physically or logicallyaccessible arrangement of the members. A plurality of the reactionmixtures can include one or more translation products or one or moretranscription products, or both one or more translation products and oneor more transcription products. The array of reaction mixtures can be ina solid phase, liquid phase or mixed phase array which includes one ormore of: the one or more shuffled or mutated nucleic acids, the one ormore transcribed shuffled nucleic acids, and the one or more in vitrotranslation reagents. The one or more shuffled or mutated nucleic acidsare optionally homologous or heterologous. The one or more transcribedshuffled or mutated nucleic acid(s) typically, though not necessarily,includes an mRNA.

[0013] The one or more in vitro translation reagents which areoptionally present in the array typically include transcriptionreagents, e.g., reticulocyte lysates, rabbit reticulocyte lysates,canine microsome translation mixtures, wheat germ in vitro translation(IVT) mixtures, E. coli lysates, or the like. As already noted, thearrays optionally further include one or more in vitro transcriptionreagents, such as an E. coli lysate, an E. coli extract, an E. coli s20extract, a canine microsome system, a HeLa nuclear extract in vitrotranscription component, an SP6 polymerase, a T3 polymerase a T7 RNApolymerase, or the like.

[0014] The device or integrated system can include a nucleic acidshuffling or mutagenesis module, which accepts input nucleic acids orcharacter strings corresponding to input nucleic acids and manipulatesthe input nucleic acids or the character strings corresponding to inputnucleic acids to produce output nucleic acids, which include the one ormore shuffled or mutagenized nucleic acids in the reaction mixturearray. The output nucleic acids optionally comprise one or more sequencewhich controls transcription or translation. Such modules include a DNAshuffling module, which accepts input DNAs or character stringscorresponding to input DNAs and manipulates the input DNAs or thecharacter strings corresponding to input DNAs to produce output DNAs,which output DNAs include the one or more shuffled DNAs in the reactionmixture array. The nucleic acid shuffling or mutagenesis module isoptionally preceded in the system or device by a module which allowsoverlapping synthetic oligonucleotides to be first assembled intooligonucleotide multimers or functional open reading frames prior toentering the mutagenesis or shuffling module. The module(s) can beoperatively linked to or include a thermocycling device, or amutagenesis module. In one aspect, the nucleic acid shuffling ormutagenesis module fragments the input nucleic acids to produce nucleicacid fragments. Alternately, the input nucleic acids optionally includecleaved or synthetic nucleic acid fragments. Optionally, the shufflingor mutagenesis module is mechanically, electronically, robotically orfluidically coupled to at least one other array operation module. Thenucleic acid shuffling or mutagenesis module can perform any of avariety of operations, including PCR, StEP PCR, uracil incorporation,chain termination, or the like. Optionally, the nucleic acid shufflingmodule separates, identifies, purifies or immobilizes any productelongated nucleic acid.

[0015] The nucleic acid shuffling module optionally includes anidentification portion which identifies one or more nucleic acid portionor subportion (e.g., by sequencing or any other product deconvolutionmethod). Similarly, the nucleic acid shuffling module optionallyincludes a fragment length purification portion which purifies selectedlength fragments of the nucleic acid fragments. In one embodiment, thenucleic acid shuffling module permits hybridization of the nucleic acidfragments. The module can also include a polymerase which elongates thehybridized nucleic acid.

[0016] The module can control incorporation of features into productnucleic acids. For example, the nucleic acid shuffling module cancombine one or more translation or transcription control sequence intoelongated product nucleic acids. The translation or transcriptioncontrol sequence(s) can be combined into the elongated nucleic acidusing the polymerase, or a ligase, or both. The nucleic acid shufflingmodule optionally determines a recombination frequency or a length, orboth a recombination frequency and a length, for any product nucleicacid(s). Similarly, the nucleic acid shuffling module can determinenucleic acid length by detecting incorporation of one or more labelednucleic acid or nucleotide into the resulting elongated nucleic acid.For example, the nucleic acid shuffling module optionally determinesnucleic acid length by detecting one or more label (e.g., dye,radioactive label, biotin, digoxin, or a fluorophore) associated withany product nucleic acid. For example, the nucleic acid shuffling modulecan determine nucleic acid length with a fluorogenic 5′ nuclease assay.

[0017] The devices and integrated systems can utilize conventional ormicroscale construction. Thus, in one aspect, the physical or logicalarray of reaction mixtures is optionally incorporated into a microscaledevice, or at least one of the reaction mixtures is incorporated into amicroscale device, or the one or more shuffled or mutagenized nucleicacids or the one or more transcribed shuffled or mutagenized nucleicacids is found within a microscale device, or the one or more in vitrotranslation reagents is optionally found within a microscale device. Thenucleic acid shuffling module optionally comprises one or moremicroscale channel (e.g., a microcapillary or chip) through which ashuffling reagent or product is flowed. Liquid flow through the deviceis mediated, e.g., by capillary flow, differential pressure between oneor more inlets and outlets, electroosmosis, hydraulic or mechanicalpressure, or peristalsis.

[0018] Nucleic acid fragments for use in the systems and devices of theinvention are optionally contacted in a single pool, or in multiplepools. For example, the nucleic acid shuffling module optionallydispenses the resulting elongated nucleic acids into one or moremultiwell plates, or onto one or more solid substrates, or into one ormore microscale systems, or into one or more containers. The nucleicacid shuffling module optionally pre-dilutes any product nucleic acidsand dispenses them into one or more multiwell plates, e.g., at aselected density per well of the product nucleic acid(s).

[0019] For example, in one embodiment, the nucleic acid shuffling moduledispenses elongated nucleic acids into one or more master multiwellplates and/or PCR amplifies the resulting master array of elongatednucleic acids to produce an amplified array of elongated nucleic acids.Optionally, the module includes a array copy system which transfersaliquots from the wells of the one or more master multiwell plates toone or more copy multiwell plates. The array of reaction mixtures isoptionally formed by separate or simultaneous addition of an in vitrotranscription reagent and an in vitro translation, reagent to the one ormore copy multiwell plates, or to a duplicate set thereof.

[0020] In one embodiment, the device or integrated system, furtherincludes one or more sources of one or more nucleic acids. The one ormore sources collectively or individually can include a first populationof nucleic acids, wherein shuffled or mutant nucleic acids are producedby recombining the one or more members of the first population ofnucleic acids. The one or more sources of nucleic acids include, e.g.,at least one nucleic acid selected from: a synthetic nucleic acid, aDNA, an RNA, a DNA analogue, an RNA analogue, a genomic DNA, a cDNA, anmRNA, a DNA generated by reverse transcription, an nRNA, an aptamer, apolysome associated nucleic acid, a cloned nucleic acid, a cloned DNA, acloned RNA, a plasmid DNA, a phagemid DNA, a viral DNA, a viral RNA, aYAC DNA, a cosmid DNA, a fosmid DNA, a BAC DNA, a P1-mid, a phage DNA, asingle-stranded DNA, a double-stranded DNA, a branched DNA, a catalyticnucleic acid, an antisense nucleic acid, an in vitro amplified nucleicacid, a PCR amplified nucleic acid, an LCR amplified nucleic acid, aQβ-replicase amplified nucleic acid, an oligonucleotide, a nucleic acidfragment, a restriction fragment and a combination thereof.

[0021] The device or integrated system optionally includes a populationdestination region, wherein, during operation of the device, one or moremembers of the first population are moved from the one or more sourcesof the one or more nucleic acids to the one or more destination regions(e.g., in the form of a solid phase array, a liquid phase array, acontainer, a microtiter tray, a microtiter tray well, a microfluidiccomponent, a microfluidic chip, a test tube, a centrifugal rotor, amicroscope slide, an organism, a cell, a tissue, a liposome, a detergentparticle, or any combination thereof). Thus, the device or integratedsystem can include nucleic acid movement means (e.g., a fluid pressuremodulator, an electrokinetic fluid force modulator, a thermokineticmodulator, a capillary flow mechanism, a centrifugal force modulator, arobotic armature, a pipettor, a conveyor mechanism, a peristaltic pumpor mechanism, a magnetic field generator, an electric field generator,one or more fluid flow path, etc.) for moving the one or more membersfrom the one or more sources of the one or more nucleic acids to the oneor more destination regions (for example, nucleic acids to be recombinedcan be moved into contact with one another). During operation of thedevice, the in vitro transcription reagent or an in vitro translationreagent is typically flowed into contact with the members of the firstpopulation. Optionally, members of the first population are fixed(immobilized) at the one or more sources of one or more nucleic acids orat the one or more destination regions. During operation of the device,the first population of nucleic acids is optionally arranged into one ormore physical or logical recombinant nucleic acid arrays, which areoptionally duplicated.

[0022] The device or integrated system can include one or more reactionmixture arraying modules which move one or more of the one or moreshuffled (or mutated) nucleic acids or the one or more transcribedshuffled or mutated nucleic acids or the in vitro translation reactantcomponents into one or more selected spatial positions. This places theone or more shuffled mutated or otherwise diversified nucleic acids orthe one or more transcribed shuffled or otherwise diversified nucleicacids or the in vitro translation reactant component into one or morelocations in the array of reaction mixtures. Thus, this module can beused to generate a recombined/mutated/shuffled nucleic acid master orduplicate array which physically or logically corresponds to positionsof mutated, shuffled or other product nucleic acids in a reactionmixture array. The device or integrated system can include a nucleicacid amplification module, which module amplifies members of the mutatedor shuffled nucleic acid master array, or a duplicate thereof. Thearraying and amplification modules can be integrated in one module ordevice.

[0023] The amplification module can include a heating or cooling element(e.g., to perform PCR, LCR or the like). For example, in one embodiment,the amplification module includes a DNA micro-amplifier. For example,the micro-amplifier can include a programmable resistor, a micromachinedzone heating chemical amplifier, a Peltier solid state heat pump, a heatpump, a heat exchanger, a hot air blower, a resistive heater, arefrigeration unit, a heat sink, a Joule Thompson cooling device, or anycombination thereof. The arraying/amplification module can produce aduplicate amplified array which produces amplicons of the nucleic acidmaster array, or duplicates thereof.

[0024] During operation of the overall device or system, the array ofreaction mixtures produces an array of reaction mixture products. Thedevice or integrated system can include one or more productidentification or purification modules, which product identificationmodules identify one or more members of the array of reaction products.For example, product identification or purification modules can includeone or more of: a gel, a polymeric solution, a liposome, amicroemulsion, a microdroplet, an affinity matrix, a plasmon resonancedetector, a BIACORE, a GC detector, an ultraviolet or visible lightsensor, an epifluorescence detector, a fluorescence detector, afluorescent array, a CCD, a digital imager, a scanner, a confocalimaging device, an optical sensor, a FACS detector, a micro-FACS unit, atemperature sensor, a mass spectrometer, a stereo-specific productdetector, an Elisa reagent, an enzyme, an enzyme substrate an antibody,an antigen, a refractive index detector, a polarimeter, a pH detector, apH-stat device, an ion selective sensor, a calorimeter, a film, aradiation sensor, a Geiger counter, a scintillation counter, a particlecounter, an H2O2 detection system, an electrochemical sensor, ion/gasselective electrodes, or a capillary electrophoresis element. For easeof detection, the one or more reaction product array members areoptionally moved into proximity to the product identification module, orthe product identification module can perform an xyz translation,thereby moving the product identification module proximal to the arrayof reaction products. Similarly, the one or more reaction product arraymembers are optionally flowed into proximity to the productidentification module, where an in-line purification system purifies theone or more reaction product array members from associated materials.

[0025] Typical reaction products include, e.g., one or more polypeptide,one or more nucleic acid, one or more catalytic RNA (e.g., a ribozyme),or one or more biologically active RNA (e.g., an anti-sense RNA). In oneclass of embodiments, the device or integrated system can include asource of one or more lipid which is flowed into contact with the one ormore polypeptide, or into contact with the physical or logical array ofreaction mixtures, or into contact with the one or more transcribedshuffled or mutagenized nucleic acids, thereby producing one or moreliposomes or micelles comprising the polypeptide, reaction mixturecomponents, or one or more transcribed shuffled or mutagenized nucleicacids. The reaction products can include one or more polypeptide whichcan be further modified by the system, e.g., by incubation with one ormore protein refolding reagent. For example, refolding agents such asguanidine, guanidinium, urea, detergents, chelating agents, DTT, DTE,chaperonins and the like can be flowed into contact with the protein ofinterest.

[0026] Product identification or purification modules in the device orintegrated system can include a protein detector, a protein purificationmeans, or the like. The product identification or purification modulescan also include an instruction set for discriminating between membersof the array of reaction products based upon, e.g., a physicalcharacteristic of the members, an activity of the members,concentrations of the members, or combinations thereof

[0027] The device or integrated system can include a secondary productarray produced by re-arraying members of the reaction product array suchthat the secondary product array has a selected concentration of productmembers in the secondary product array. The selected concentration isoptionally approximately the same for a plurality of product members inthe secondary product array. This facilitates comparison of activity ordetectable feature levels across or among members of the secondaryproduct array. In an alternate or complementary aspect, the device orintegrated system can include an instruction set or physical or logicalfilter for determining a correction factor which accounts for variationin polypeptide concentration at different positions in the amplifiedphysical or logical array of polypeptides.

[0028] The device or integrated system of can include a substrateaddition module which adds one or more substrate to a plurality ofmembers of the product array or the secondary product array. In thisembodiment, a substrate conversion detector is provided to monitorformation of a product produced by contact between the one or moresubstrate and one or more of the plurality of members of the productarray or the secondary product array. Formation of product ordisappearance of substrate is monitored directly or indirectly, forexample, by monitoring loss of the substrate or formation of productover time. Formation of the product or disappearance of substrate isoptionally monitored enantioselectively, regioselectively or stereoselectively. For example, formation of the product or disappearance ofsubstrate is optionally monitored by adding at least one isomer,enantiomer or stereoismer in substantially pure form (e.g., independentof other potential isomers). Formation of the product is optionallymonitored by detecting any detectable product, e.g., by monitoringformation of peroxide, protons, or halides, or reduced or oxidizedcofactors, changes in heat or entropy which result from contact betweenthe substrate and the product, changes in mass, charge, fluorescence,epifluorescence, by chromatography, luminescence or absorbance, of thesubstrate or the product, which result from contact between thesubstrate and the product.

[0029] The device or integrated system optionally includes an arraycorrespondence module, which identifies, determines or records thelocation of an identified product in the array of reaction mixtureproducts which is identified by the one or more product identificationmodules, or which array correspondence module determines or records thelocation of at least a first nucleic acid member of the shuffled ormutant nucleic acid master array, or a duplicate thereof, or of anamplified duplicate array, where the member corresponds to the locationof one or more member of the array of reaction products.

[0030] The device or integrated system optionally includes one or moresecondary selection module which selects at least the first member forfurther recombination, which selection is based upon the location of aproduct identified by the product identification module(s).

[0031] The device or integrated system optionally includes a screeningor selection module. For example, the module can include one or more of:an array reader, which detects one or more member of the array ofreaction products; an enzyme which converts one or more member of thearray of reaction products into one or more detectable products; asubstrate which is converted by the one or more member of the array ofreaction products into one or more detectable products; a cell whichproduces a detectable signal upon incubation with the one or more memberof the array of reaction products; a reporter gene which is induced byone or more member of the array of reaction products; a promoter whichis induced by one or more member of the array of reaction products,which promoter directs expression of one or more detectable products;and an enzyme or receptor cascade which is induced by the one or moremember of the array of reaction products.

[0032] The device or integrated system can include a secondaryrecombination module, which physically contacts the first member, or anamplicon thereof, to an additional member of the shuffled or mutantnucleic acid master array, or the duplicate thereof, or the amplifiedduplicate array, thereby permitting physical recombination between thefirst and additional members.

[0033] The device or integrated system optionally includes a DNAfragmentation module which can include a recombination region. The DNAfragmentation module can include, e.g., one or more of: a nuclease, amechanical shearing device, a polymerase, a random primer, a directedprimer, a nucleic acid cleavage reagent, a chemical nucleic acid chainterminator, and an oligonucleotide synthesizer. During operation of thedevice, fragmented DNAs produced in the DNA fragmentation module areoptionally recombined in the recombination region to produce one or moremutated, shuffled or otherwise altered nucleic acids.

[0034] Common operations for the device or system include modules whichperform one or more of: error prone PCR, site saturation mutagenesis, orsite-directed mutagenesis. Many other diversity generating reactionswhich can be practiced in modules of the devices or systems are setforth herein.

[0035] The device or integrated system optionally includes a datastructure embodied in a computer, such as an analog computer or adigital computer, or in a computer readable medium. The data structurecorresponds to the one or more shuffled or otherwise modified nucleicacid(s).

[0036] The device or integrated system optionally includes one or morereaction mixtures which include one or more mutated or shuffled nucleicacids arranged in a microtiter tray at an average of approximately0.1-100 shuffled or otherwise modified nucleic acids per well, e.g., anaverage of approximately 1-5 such nucleic acids per well.

[0037] The device or integrated system optionally includes a diluterwhich pre-dilutes the concentration of the one or more shuffled,modified or mutated nucleic acids prior to addition of the shuffled ormutant nucleic acids to the reaction mixtures. The concentration of theone or more modified, mutated or shuffled nucleic acids afterpre-dilution is about 0.01 to 100 molecules per microliter.

[0038] In one class of embodiments, the reaction mixtures are producedin the device or system by adding the in vitro translation reactant and,optionally, an in vitro transcription reagent, to a duplicate shuffledor mutated nucleic acid array. The duplicate shuffled or mutated nucleicacid array is duplicated from a master array of the shuffled or mutatednucleic acids produced by spatially or logically separating members of apopulation of the shuffled or mutated nucleic acids to produce aphysical or logical array of the shuffled or mutated nucleic acids. Forexample, the array can be produced by one or more arraying technique,including (1) lyophilizing members of the population of mutated,shuffled or otherwise altered nucleic acids on a solid surface, therebyforming a solid phase array, (2) chemically coupling members of thepopulation of mutated, shuffled or otherwise altered nucleic acids to asolid surface, thereby forming a solid phase array, (3) rehydratingmembers of the population of mutated, shuffled or otherwise alterednucleic acids on a solid surface, thereby forming a liquid phase array,(4) cleaving chemically coupled members of the population of mutated,shuffled or otherwise altered nucleic acids from a solid surface,thereby forming a liquid phase array, (5) accessing one or morephysically separated logical array members from one or more sources ofmutated, shuffled or otherwise altered nucleic acids and flowing thephysically separated logical array members to one or more destination,the one or more destinations constituting a logical array of themutated, shuffled or otherwise altered nucleic acids, and (6) printingmembers of a population of mutated, shuffled or otherwise alterednucleic acids onto a solid material to form a solid phase array.Optionally, greater than about 1% of the physical or logical array ofreaction mixtures comprise shuffled or mutant nucleic acids having oneor more base changes relative to a parental nucleic acid.

[0039] In one aspect, one or more mutated, recombined (e.g., shuffled)or otherwise modified nucleic acids are produced by synthesizing a setof overlapping oligonucleotides, or by cleaving a plurality ofhomologous nucleic acids to produce a set of cleaved homologous nucleicacids, or both, and permitting recombination to occur between the set ofoverlapping oligonucleotides, the set of cleaved homologous nucleicacids, or both the set of overlapping oligonucleotides and the set ofcleaved homologous nucleic acids.

[0040] In one aspect, the invention provides a diversity generationdevice. The device includes a programmed thermocycler and afragmentation module operably coupled to the programmed thermocycler.The programmed thermocycler typically includes a thermocycler operablycoupled to a computer which includes one or more instruction set, e.g.,for calculating an amount of uracil and an amount of thymidine for usein the programmed thermocycler, calculating one or more crossover regionbetween two or more parental nucleotides calculating an annealingtemperature, calculating an extension temperature, selecting one or moreparental nucleic acid sequence, or the like.

[0041] The one or more instruction set receives user input data and setsup one or more cycle to be performed by the programmed thermocycler. Theinput data typically includes one or more parental nucleic acidsequence, a desired crossover frequency, an extension temperature,and/or an annealing temperature, or other features which control thereaction of interest.

[0042] In one aspect, the one or more instruction set calculates anamount of uracil and an amount of thymidine based on a desired fragmentsize. In other aspects, the one or more instruction set directs the oneor more cycle on the diversity generation device, e.g., amplifies theone or more parental nucleic acid sequence, fragments the one or moreparental nucleic acid sequence to produce one or more nucleic acidfragment, reassembles the one or more nucleic acid fragment to produceone or more mutated, shuffled or otherwise altered nucleic acid, and/oramplifies the one or more mutated, shuffled or otherwise altered nucleicacid. For example, the set can direct amplifying the one or moreparental nucleic acid sequence in the presence of uracil. Optionally,the one or more cycle pauses between steps to allow addition of one ormore fragmentation reagent.

[0043] The one or more instruction set optionally performs one or morecalculation based on one or more theoretical prediction of a nucleicacid melting temperature or on one or more set of empirical data, whichempirical data comprises a comparison of one or more nucleic acidmelting temperature. The one or more instruction set optionallyinstructs the fragmentation module to fragment the parental nucleicacids to produce one or more nucleic acid fragments having a desiredmean fragment size.

[0044] The programmed thermocycler comprises a thermocycler and,optionally, software for performing one or more shuffling calculations,which software is embodied on a web page, an attached computer, anintranet server, or, e.g., installed directly in the thermocycler.

[0045] In one aspect, a similar diversity generation device is provided.The device includes a computer, which includes at least a firstinstruction set for creating one or more nucleic acid fragment sequencefrom one or more parental nucleic acid sequence and a synthesizermodule, which synthesizes the one or more nucleic acid fragmentsequence. The device also includes a thermocycler which generates one ormore diverse sequence from the one or more nucleic acid fragmentsequence. The first instruction set optionally limits or expandsdiversity of the one or more nucleic acid fragment sequence by adding orremoving one or more amino acid having similar diversity; selecting afrequently used amino acid at one or more specific position; using oneor more sequence activity calculation; using a calculated overlap withone or more additional oligonucleotide; based on an amount ofdegeneracy, or based on a melting temperature. In one aspect, thethermocycler performs an assembly/rescue PCR reaction.

[0046] The diversity generation device can include a synthesizer modulehaving a microarray oligonucleotide synthesizer. For example, thesynthesizer module optionally includes an ink-jet printer head basedoligonucleotide synthesizer. The synthesizer module optionallysynthesizes the one or more nucleic acid fragment sequences on a solidsupport. The synthesizer module optionally uses one or moremononucleotide coupling reactions or one or more trinucleotide couplingreactions to synthesize the one or more nucleic acid fragment sequence.

[0047] The computer optionally comprises at least a second instructionset, which second instruction set determines at least a first set ofconditions for the assembly/rescue PCR reaction.

[0048] The device optionally further includes a screening module forscreening the one or more diverse sequence for a desired characteristic.For example, the screening module optionally comprises a high-throughputscreening module.

[0049] In a related aspect, a diversity generation kit is provided. Forexample, the kit can include the diversity generation devices above andone or more reagent for diversity generation. Example reagents include Ecoli., a PCR reaction mixture comprising a mixture of uracil andthymidine, one or more uracil cleaving enzyme, and a PCR reactionmixture comprising standard dNTPs. The one or more uracil cleavingenzyme optionally includes a uracil glycosidase and an endonuclease. Themixture of uracil and thymidine comprises a desired ratio of uracil tothymidine, which desired ratio is calculated by the diversity generationdevice, based upon user selected inputs.

[0050] Optionally, the diversity generation kit can include one or moreartificially evolved enzyme such as an artificially evolved polymerase.The kit can also include, e.g., packaging materials, a container adaptedto receive the device or reagents, and instructional materials for useof the device.

[0051] The devices and integrated systems herein can include datatracking modules such as a bar-code based sample tracking module, whichincludes, e.g., a bar code reader and a computer readable databasecomprising at least one entry for at least one array or at least onearray member, which entry is corresponded to at least one bar code. Longterm storage devices can also be incorporated into the devices andintegrated systems herein (and the methods herein can include storage insuch long term storage modules). For example, as noted, the storagemodule can include, e.g., a refrigerator, an electrically poweredcooling device, a device capable of maintaining a temperature of <0 C.;a freezer, a device which uses liquid nitrogen or liquid helium forcooling storing or freezing samples, a container comprising wet or dryice, a constant temperature and/or constant humidity chamber orincubator, an automated sample storage or retrieval unit, a dessicatoror moisture minimizing or reducing device, one or more modules formoving arrays or array members into the long term storage device etc.

[0052] As noted in more detail herein, the invention provides devicesand integrated systems, e.g., which include a physical or logical arrayof reaction mixtures, each reaction mixture comprising one or moreshuffled or mutagenized nucleic acids and one or more transcribedshuffled or transcribed mutagenized nucleic acids or one or more invitro translation reagents. Also provided are libraries of shuffled ormutated or mutagenized nucleic acids formatted in a logical and physicalarray based on at least one physical and one activity parameter. Devicesor integrated systems which use a fluorescent or visible signal to sorta shuffled or mutagenized nucleic acid library into a spatial array ofcells, particles or molecules are also provided. These include, e.g., aphysical or logical array of comprising one or more shuffled ormutagenized nucleic acids or one or more transcribed shuffled ortranscribed mutagenized nucleic acids or one or more in vitrotranslation reagents.

[0053] The present invention also provides a number of related methods,both for use with the integrated systems and devices of the inventionand for use separate from the devices and systems.

[0054] For example, in one class of methods of the invention, methods ofprocessing shuffled or mutagenized nucleic acids are provided. In themethods, a physical (e.g., solid or liquid phase) or logical array ofreaction mixtures is provided. A plurality of the reaction mixturesinclude one or more member of a first population of nucleic acids. Thefirst population of nucleic acids include one or more shuffled ormutagenized nucleic acids, or one or more transcribed shuffled ormutagenized nucleic acids. A plurality of the plurality of reactionmixtures typically further include an in vitro translation reactant. Oneor more in vitro translation products produced by a plurality of membersof the physical or logical array of reaction mixtures is then detected.Any of the various array configurations noted above or herein for thedevices and integrated systems of the invention are can be used in thesemethods.

[0055] For example, in one embodiment, a population of nucleic acids(which can be homologous or heterologous) is physically arrayed on asolid substrate, such as a chip, slide, membrane, or well of amicrotiter tray or plate. The arrayed nucleic acids are recombined withone or more additional nucleic acids, thereby providing an arrayedlibrary of recombinant nucleic acids. These recombinant nucleic acidsare then amplified and screened to identify members of the array thatpossess a desired property. In some embodiments, an oligonucleotideprimer is tethered to the solid substrate and an additionalsingle-stranded nucleic acid is annealed to the oligonucleotide which isthen extended with a nucleic acid polymerase. In alternativeembodiments, a single-stranded template polynucleotide is hybridizedwith a set of partially overlapping complementary nucleic acid fragmentswhich are extended to produce an arrayed library of recombinant nucleicacids. For example, one or more template nucleic acids are immobilizedon a solid support. Partially overlapping complementary nucleic acidfragments are annealed to the template polynucleotide, and extended orligated to produce a heteroduplex comprising the template nucleic acidand a substantially full-length heterolog complementary to the templatenucleic acid. The heterolog is recovered and, optionally, furtherdiversified.

[0056] A number of variants of this basic methodology are set forthherein, as are a variety of products produced by the methods and theirvariants and apparatus and kits for performing the methods.

[0057] For example, the one or more mutated, shuffled or otherwisealtered nucleic acids are optionally produced in an automatic DNAshuffling, recombination, or mutation module. Optionally, the methodincludes inputting DNAs or character strings corresponding to input DNAsinto the DNA shuffling module and accepting output DNAs from the DNAshuffling module, where the output DNAs include the one or more mutated,shuffled or otherwise altered nucleic acids in the reaction mixturearray. The input DNA in the DNA shuffling module can be cleaved toproduce DNA fragments, or provide the input DNAs can include cleaved orsynthetic DNA fragments. DNA fragments, e.g., of a selected length canbe purified in the DNA shuffling module. Purified DNA fragments can behybridized and elongated with a polymerase. The resulting elongatednucleic acids can be separated, identified, cloned, purified, or thelike. A recombination frequency or a length, or both a recombinationfrequency and a length for the resulting elongated DNAs can bedetermined, e.g., by detecting incorporation of one or more labelednucleic acid or nucleotide into the elongated DNAs.

[0058] The invention provides for a variety of physical manipulations ofthe various reagents and products of the invention. including, flowing,e.g., a shuffling reagent or product through a microscale channel in theDNA shuffling module, contacting the components in single or multiplepools, dispensing materials into one or more multiwell plates,dispensing materials into one or more multiwell plates at a selecteddensity per well of the elongated DNAs, dispensing the product elongatedDNAs into one or more master multiwell plates and PCR amplifying theresulting master array of elongated nucleic acids to produce anamplified array of elongated nucleic acids, etc. Optionally, theshuffling module includes an array copy system which transfers aliquotsfrom the wells of the one or more master multiwell plates to one or morecopy multiwell plates.

[0059] The methods optionally include determining an extent of PCRamplification by any available technique, including, e.g., incorporationof a label into one or more amplified elongated nucleic acid, applying afluorogenic 5′ nuclease assay or the like.

[0060] In one aspect, the array of reaction mixtures is formed byseparate or simultaneous addition of in vitro transcription reagents andan in vitro translation reactant to the one or more copy multiwellplates, or to a duplicate set thereof, wherein the elongated DNAscomprise the one or more mutated, shuffled or otherwise altered nucleicacids. Typically, the array of reaction mixtures produces an array ofreaction mixture products, e.g., comprising one or more polypeptide. Themethods optionally include re-folding the one or more polypeptide bycontacting the one or more polypeptide with a refolding reagent such asguanidine, urea, DTT, DTE, and/or a chaperonin. The one or morepolypeptide with one or more lipid to produce one or more liposome ormicelle, which liposome or micelle comprises the one or morepolypeptide.

[0061] The methods optionally include moving the one or more reactionproduct array members into proximity to a product identification module,or moving a product identification module into proximity to the reactionproduct array members. The one or more reaction product array membersare optionally flowed into proximity to a product identification module.In-line purification of the one or more reaction product array memberscan be performed.

[0062] In one aspect, the method further includes reading the array ofreaction mixture products with an array reader which detects one or moremember of the array of reaction products. In another aspect, one or moremember of the array of reaction products is converted with an enzymeinto one or more detectable products. Similarly, one or more substratescan be converted by the one or more member of the array of reactionproducts into one or more detectable products. These detectable productsare optionally detected in he array reader.

[0063] A cell can be contacted to one or more member of the array ofreaction products, which cell or reaction product, or both, produce adetectable signal upon contacting the one or more member of the array ofreaction products.

[0064] A variety of detectable events can be induced, including inducinga reporter gene with one or more member of the array of reactionproducts, inducing a promoter with one or more member of the array ofreaction products which directs expression of one or more detectableproducts, including inducing an enzyme or receptor cascade with one ormore member of the array of reaction products which is induced by theone or more member of the array of reaction products.

[0065] Methods of recombining members of a physical or logical array ofnucleic acids are also provided. In the methods, a first population ofnucleic acids is provided, or a data structure (e.g., embodied in acomputer, an analog computer, a digital computer, or a computer readablemedium) comprising character strings corresponding to the firstpopulation of nucleic acids (e.g., embodied in a computer, an analogcomputer, a digital computer, or a computer readable medium) isprovided. One or more members of the first population of nucleic acidsare recombined, thereby providing a first population of recombinantnucleic acids. Alternatively, one or more character stringscorresponding to one or more members of the first population of nucleicacids are recombined, thereby providing a population of characterstrings corresponding to the first population of recombinant nucleicacids. In this embodiment, the population of character stringscorresponding to the first population of recombinant nucleic acids isconverted into the first population of recombinant nucleic acids,thereby providing the first population of recombinant nucleic acids. Ineither case, members of the population of recombinant nucleic acids arespatially or logically separated to produce a physical or logical arrayof recombinant nucleic acids. The recombinant nucleic acids in thephysical or logical array of recombinant nucleic acids are amplified invitro (e.g., by enzymatic or synthetic means) to provide an amplifiedphysical or logical array of recombinant nucleic acids. Alternately,members of the population of recombinant nucleic acids are amplified (orsynthesized) and physically or logically separated to produce anamplified physical or logical array of recombinant nucleic acids.Typically, the amplified physical or logical array of recombinantnucleic acids, or a duplicate thereof, is screened for one or moredesired property. Optionally, the amplified physical or logical array ofrecombinant nucleic acids, or a duplicate thereof, is screened for adesired property. A variety of variants of this basic class of methodsare set forth herein, as are a variety of products produced by themethods and their variants and kits and apparatus for practicing themethods.

[0066] Spatially or logically separating members of the population ofrecombinant nucleic acids to produce a physical or logical array ofrecombinant nucleic acids or amplified recombinant nucleic acidsoptionally includes plating the nucleic acids in a microtiter tray at anaverage of approximately 0.1-10 (e.g., 1-5) array members per well.Optionally, spatially or logically separating the members of thepopulation of recombinant nucleic acids includes diluting the members ofthe population with a buffer. The concentration of the population ofrecombinant nucleic acids after dilution is typically about 0.01 to 100molecules per microliter.

[0067] Spatially or logically separating members of the population ofrecombinant nucleic acids to produce a physical or logical array ofrecombinant nucleic acids can also include one or more of: (i)lyophilizing members of the population of recombinant nucleic acids on asolid surface, thereby forming a solid phase array; (ii) chemicallycoupling members of the population of recombinant nucleic acids to asolid surface, thereby forming a solid phase array; (iii) rehydratingmembers of the population of recombinant nucleic acids on a solidsurface, thereby forming a liquid phase array; (iv) cleaving chemicallycoupled members of the population of recombinant nucleic acids from asolid surface, thereby forming a liquid phase array; and, (v) accessingone or more physically separated logical array members from one or moresources of recombinant nucleic acids and flowing the physicallyseparated logical array members to one or more destination.

[0068] Methods of recombining members of a physical or logical array ofnucleic acid are provided. In the methods, at least a first populationof nucleic acids is arranged in a physical or logical array. One or moremembers of the first population of nucleic acids is recombined with oneor more additional nucleic acid, thereby providing a first physical orlogical array comprising a population of recombined nucleic acids. Therecombined nucleic acids in the physical or logical array of recombinednucleic acids are amplified, usually in vitro, to provide an amplifiedphysical or logical array of recombined nucleic acids. The first oramplified physical or logical array of recombined nucleic acids, or oneor more duplicate thereof, is then screened for one or more desiredproperties. As above, a number of variants of this basic class ofmethods are set forth herein. In some embodiments, the recombination ofnucleic acids is performed on a solid substrate such as a slide,membrane or “chip.” For example, a population of nucleic acids isphysically arrayed on a solid substrate, such as a chip, slide,membrane, or well of a microtiter tray or plate. The arrayed nucleicacids are recombined with one or more additional nucleic acids, therebyproviding an arrayed library of recombinant nucleic acids. Theserecombinant nucleic acids are then amplified and a screened to identifymembers of the array that possess a desired property. In someembodiments, an oligonucleotide primer is tethered to the solidsubstrate and an additional single-stranded nucleic acid is annealed tothe oligonucleotide which is then extended with a nucleic acidpolymerase. In alternative embodiments, a single-stranded templatepolynucleotide is hybridized with a set of partially overlappingcomplementary nucleic acid fragments which are extended to produce anarrayed library of recombinant nucleic acids. For example, one or moretemplate nucleic acids are immobilized on a solid support. Partiallyoverlapping complementary nucleic acid fragments are annealed to thetemplate polynucleotide, and extended or ligated to produce aheteroduplex comprising the template nucleic acid and a substantiallyfull-length heterolog complementary to the template nucleic acid. Theheterolog is recovered and, optionally, further diversified. A varietyof products produced by the methods aid their variants and kits andapparatus for practicing the methods are similarly described.

[0069] In the above methods, the first population of nucleic acids orthe population of recombinant nucleic acids are typically arranged in aphysical or logical matrix at an average of approximately 0.1-10 (e.g.,0.5-5) array members per array position. The first population of nucleicacids or the population of recombinant nucleic acids optionally includea solid phase or a liquid phase array. Optionally, the first populationof nucleic acids is provided by one or more of: synthesizing a set ofoverlapping oligonucleotides, cleaving a plurality of homologous nucleicacids to produce a set of cleaved homologous nucleic acids, step PCR ofone or more target nucleic acid, uracil incorporation and cleavageduring copying of one or more target nucleic acids, and incorporation ofa cleavable nucleic acid analogue into a target nucleic acid andcleavage of the resulting target nucleic acid. In another approach, thefirst population of nucleic acids is provided by synthesizing a set ofoverlapping oligonucleotides, by cleaving a plurality of homologousnucleic acids to produce a set of cleaved homologous nucleic acids, orboth. The set of overlapping oligonucleotides or the set of cleavedhomologous nucleic acids are optionally flowed into one or more selectedphysical locations.

[0070] The first population of nucleic acids is optionally provided bysonicating, cleaving, partially synthesizing, random primer extending ordirected primer extending one or more of: a synthetic nucleic acid, aDNA, an RNA, a DNA analogue, an RNA analogue, a genomic DNA, a cDNA, anmRNA, a DNA generated by reverse transcription, an nRNA, an aptamer, apolysome associated nucleic acid, a cloned nucleic acid, a cloned DNA, acloned RNA, a plasmid DNA, a phagemid DNA, a viral DNA, a viral RNA, aYAC DNA, a cosmid DNA, a fosmid DNA, a BAC DNA, a P1-mid, a phage DNA, asingle-stranded DNA, a double-stranded DNA, a branched DNA, a catalyticnucleic acid, an antisense nucleic acid, an in vitro amplified nucleicacid, a PCR amplified nucleic acid, an LCR amplified nucleic acid, aQβ-replicase amplified nucleic acid, an oligonucleotide, a nucleic acidfragment, a restriction fragment and/or a combination thereof.

[0071] The first population of nucleic acids is optionally modified bypurifying one or more member of the first population of nucleic acids.Optionally, the first population of nucleic acids is provided bytransporting one or more members of the population from one or moresources of one or more members of the first population to one or moredestinations of the one or more members of the first population ofnucleic acids. For example, the transporting optionally includes flowingthe one or more members from the source to the destination. The one ormore sources of nucleic acids can include any of:: a solid phase array,a liquid phase array, a container, a microtiter tray, a microtiter traywell, a microfluidic chip, a test tube, a centrifugal rotor, amicroscope slide, and/or a combination thereof.

[0072] Amplifying the recombinant nucleic acids in the physical orlogical array of recombinant nucleic acids, or amplifying the elongatednucleic acids in the master array optionally includes one or moreamplification technique selected from: PCR, LCR, SDA, NASBA, TMA andQβ-replicase amplification. Optionally, amplifying the recombinantnucleic acids in the physical or logical array or amplifying theelongated nucleic acids in the master array comprises heating or coolingthe physical or logical array or the master array, or a portion thereof.

[0073] Amplifying the recombinant nucleic acids in the physical orlogical array or amplifying the elongated nucleic acids in the masterarray can include incorporating one or more transcription or translationcontrol subsequence into one or more of: the elongated nucleic acids,the recombinant nucleic acids in the physical or logical array, anintermediate nucleic acid produced using the elongated nucleic acids orthe recombinant nucleic acids in the physical or logical array as atemplate, or a partial or complete copy of the elongated nucleic acidsor the recombinant nucleic acids in the physical or logical array. Theone or more transcription or translation control subsequence isoptionally ligated to into one or more of: the elongated nucleic acids,the recombinant nucleic acids in the physical or logical array, anintermediate nucleic acid produced using the elongated nucleic acids orthe recombinant nucleic acids in the physical or logical array as atemplate, and a partial or complete copy of the elongated nucleic acidsor the recombinant nucleic acids in the physical or logical array. Theone or more transcription or translation control subsequence isoptionally hybridized or partially hybridized to one or more of: theelongated nucleic acids, the recombinant nucleic acids in the physicalor logical array, an intermediate nucleic acid produced using theelongated nucleic acids or the recombinant nucleic acids in the physicalor logical array as a template, or a partial or complete copy of theelongated nucleic acids or the recombinant nucleic acids in the physicalor logical array.

[0074] In one aspect, the recombinant nucleic acids in the physical orlogical array or the elongated nucleic acids in the master array areamplified in a DNA micro-amplifier. The micro-amplifier can include oneor more of: a programmable resistor, a micromachined zone heatingchemical amplifier, a chemical denaturation device, an electrostaticdenaturation device, and/or a microfluidic electrical fluid resistanceheating device. Similarly, the physical or logical array, or portionthereof or the master array or portion thereof, is heated or cooled byone or more of: a Peltier solid state heat pump, a heat pump, aresistive heater, a refrigeration unit, a heat sink, and a JouleThompson cooling device. The methods optionally include producing aduplicate amplified physical or logical array of recombinant nucleicacids.

[0075] The methods can similarly include in vitro transcribing membersof the amplified physical or logical array of recombinant nucleic acidsto produce an amplified array of in vitro transcribed nucleic acids. Inone aspect, screening the amplified physical or logical array ofrecombinant nucleic acids, or a duplicate thereof, for a desiredproperty comprises assaying a protein or product nucleic acid encoded byone or more members of the amplified physical or logical array ofrecombinant nucleic acids for one or more property.

[0076] In one aspect, the invention provides recombination of nucleicacids using a single-stranded template. In the methods, a firstpopulation of single-stranded template polynucleotides is provided. Thetemplate polynucleotides are the same or different. The templates arerecombined by: (i) annealing a plurality of partially overlappingcomplementary nucleic acid fragments; and, (ii) extending the annealedfragments to produce a physical or logical array comprising a firstpopulation of recombinant nucleic acids. In one embodiment, a physicalarray comprising the first population of template polynucleotides isprovided immobilized on a solid support (e.g., a glass support, aplastic support, a silicon support, a chip, a bead, a pin, a filter, amembrane, a microtiter plate, a slide or the like). In one embodiment,the first population of template polynucleotides comprises substantiallyan entire genome (e.g., a bacterial or fungal genome). In anotherembodiment, the first population of template polynucleotides comprisessubstantially all of the expression products of a cell (e.g., eukaryoticor prokaryotic), tissue or organism. Optionally, the first population oftemplate polynucleotides comprises a subset of the expression productsof a cell, tissue or organism. The first population of templatepolynucleotides optionally comprises a library of genomic nucleic acidsor cellular expression products (e.g., mRNAs, cDNAs, etc.).

[0077] The template polynucleotides optionally include one or more of: acoding RNA, a coding DNA, an antisense RNA, and antisense DNA, anon-coding RNA, a non-coding DNA, an artificial RNA, an artificial DNA,a synthetic RNA, a synthetic DNA, a substituted RNA, a substituted DNA,a naturally occurring RNA, a naturally occurring DNA, a genomic RNA, agenomic DNA, a cDNA, or the like.

[0078] In one aspect, members of the amplified physical or logicalarrays of recombinant nucleic acids herein are transcribed to produce anamplified array of transcribed nucleic acids. These can be translated toproduce an amplified physical or logical array of polypeptides. Theconcentration of polypeptide or transcribed nucleic acids can bedetermined at one or more positions in the amplified physical or logicalarray of polypeptides.

[0079] In one aspect, the invention provides for re-arraying theamplified physical or logical array of polypeptides or in vitrotranscribed nucleic acids in a secondary polypeptide or in vitrotranscribed nucleic acid array which has an approximately uniformconcentration of polypeptides or in vitro transcribed nucleic acids at aplurality of locations in the secondary polypeptide array. Alternately,or in conjunction, a correction factor which accounts for variation inpolypeptide or in vitro transcribed nucleic acid concentrations atdifferent positions in the amplified physical or logical array ofpolypeptides or in vitro transcribed nucleic acids can be applied tonormalize detectable data.

[0080] In one aspect, one or more substrate is added to a plurality ofmembers of the logical array of polypeptides or in vitro transcribednucleic acids. Formation of a product produced by contact between theone or more substrate and one or more of the plurality of members of thelogical array of polypeptides can be monitored, directly or indirectly.Formation of the product is detected, e.g., by a coupled enzymaticreaction which detects the product or the substrate or a secondaryproduct of the product or substrate. For example, peroxide productioncan be monitored. Similarly, formation of the product is optionallydetected by monitoring production of heat or entropy which results fromthe formation of the product.

[0081] The physical or logical array of polypeptides is optionallyselected for a desired property, thereby identifying one or moreselected member of the physical or logical array of polypeptides whichhas a desired property, and identifying one or more selected member ofthe amplified physical or logical array of recombinant nucleic acidsthat encodes the one or more member of the physical or logical array ofpolypeptides. For example, the selecting is optionally performed in aprimary screening assay, comprising one or more of: (i) re-selecting theone or more selected member of the amplified physical or logical arrayof recombinant nucleic acids in a secondary screening assay; (ii),quantifying protein levels at one or more location in the physical orlogical array of polypeptides; (iii) purifying proteins from one or morelocations in the physical or logical array of polypeptides; (iv)normalizing activity levels in the primary screen by compensating forprotein quantitation at a plurality of locations in the physical orlogical array of polypeptides; (v) determining a physical characteristicof the one or more selected members; and, (vi) determining an activityof the one or more selected members. In a further aspect, the one ormore selected member of the amplified physical or logical array ofrecombinant nucleic acids are recombined with one or more additionalnucleic acids, in vivo, in vitro or in silico.

[0082] One or more member of the amplified physical or logical array, ora duplicate thereof, can be selected based upon the screening of theamplified physical or logical array for a desired property. Optionally,a plurality of members of the amplified physical or logical array orduplicate thereof are selected, recombined and re-arrayed to form asecondary array of recombined selected nucleic acids, which secondaryarray is re-screened for the desired property, or for a second desiredproperty.

[0083] Methods of detecting or enriching for in vitro transcription ortranslation products are also provided. In the methods, one or morefirst nucleic acids which encode one or more moieties are localizedproximal to one or more moiety recognition agents which specificallybind the one or more moieties. The one or more nucleic acids are invitro translated or transcribed, producing the one or more moieties(e.g., polypeptides or biologically active RNAs such as anti-sense orribozyme molecules, or other product molecules). The one or moremoieties diffuse or flow into contact with the one or more moietyrecognition agents. Binding of the one or more moieties to the one ormore moiety recognition agents is permitted and the one or more moietiesare detected or enriched for by detecting or collecting one or morematerials proximal to, within or contiguous with the moiety recognitionagent (the material comprises at least one of the one or more moieties,where the moieties comprise one or more in vitro translation ortranscription product). Optionally, the one or more moieties are pooledby pooling the material which is collected. Here again, a variety ofvariants of this basic class of methods are set forth herein as are avariety of products produced by the methods and their variants.

[0084] Optionally, the one or more moieties (e.g., polypeptides or RNAs)are pooled by pooling the material which is collected. The moietyrecognition agents noted above optionally include one or more antibodyor one or more second nucleic acids. The first nucleic acids optionallyinclude a related population of mutated, shuffled or otherwise alterednucleic acids. In another aspect, the first nucleic acids optionallyinclude a related population of mutated, shuffled or otherwise alterednucleic acids which encode an epitope tag bound by the moiety or the oneor more moiety recognition agents.

[0085] In one aspect, the first nucleic acids comprise a relatedpopulation of mutated, shuffled or otherwise altered nucleic acids and aPCR primer binding region. Alternately, the first nucleic acidsoptionally comprise a related population of mutated, shuffled orotherwise altered nucleic acids and a PCR primer binding region. In thisembodiment, the method further includes identifying one or more targetfirst nucleic acid by proximity to the moieties which are bound to theone or more moiety recognition agent, and amplifying the target firstnucleic acid by hybridizing a PCR primer to the PCR primer bindingregion and extending the primer with a polymerase. The method optionallyincludes PCR amplifying a set of parental nucleic acids to produce therelated population of mutated, shuffled or otherwise altered nucleicacids.

[0086] In one typical embodiment, the first nucleic acids comprise aninducible or constitutive heterologous promoter. The first nucleic acidsand the one or more moiety recognition agents are typically localized ona solid substrate (e.g., a bead, chip, slide or the like). In oneembodiment, the first nucleic acids and the one or more moietyrecognition agents are localized on the solid substrate by one or moreof: a cleavable linker chemical linker, a gel, a colloid, a magneticfield, and an electrical field.

[0087] An activity of the moiety or moiety recognition agent istypically detected and the one or more first nucleic acid coupled to themoiety or moiety recognition agent is picked with an automated robot,e.g., by placing a capillary on a region comprising the detectedactivity of the moiety or moiety recognition agent. The moiety or moietyin contact with the moiety recognition agent is optionally cleaved at acleavable linker which attaches the first nucleic acid to a solidsubstrate, providing for isolation of the first nucleic acid.

[0088] Methods of producing duplicate arrays of shuffled or mutagenizednucleic acids are provided. In the methods, a physical or logical arrayof shuffled or mutagenized nucleic acids or transcribed shuffled ortranscribed mutagenized nucleic acids is provided. A duplicate array ofcopies (generated, e.g., using a polymerase or nucleic acid synthesizer)of the shuffled or mutagenized nucleic acids or copies of thetranscribed shuffled or transcribed mutagenized nucleic acids is formedby physically or logically organizing the copies into a physical orlogical array. Once again, a variety of variants of this basic class ofmethods are set forth herein, as are a variety of products produced bythe methods and their variants.

[0089] In one aspect, an array of reaction mixtures which corresponds tothe physical or logical array of shuffled or mutagenized nucleic acidsor transcribed shuffled or transcribed mutagenized nucleic acids isformed. The reaction mixtures include members of the array of shuffledor mutagenized nucleic acids or transcribed shuffled or transcribedmutagenized nucleic acids or the duplicate array of copies of theshuffled or mutagenized nucleic acids or copies of the transcribedshuffled or transcribed mutagenized nucleic acids, or a derivative copythereof. The reaction mixtures typically further include one or more invitro transcription or translation reagent.

[0090] Methods of normalizing an array of reaction mixtures areprovided. In the methods, a physical or logical array of diversified(e.g., shuffled or mutagenized) nucleic acids or transcribed shuffled ortranscribed mutagenized nucleic acids is in vitro transcribed ortranslated to produce an array of products. A correction factor isdetermined which accounts for variation in concentration of the productsat different sites in the array of products. Typically, a secondaryproduct array is produced which comprises selected concentrations of theproducts at one or more sites in the secondary array, e.g., bytransferring aliquots from a plurality of sites in the array of productsto a plurality of secondary sites in the secondary array. Optionally,the products are diluted while being transferred or after transfer tothe secondary sites, thereby selecting the concentration of the productsat the secondary sites in the secondary array.

[0091] In one aspect, the invention provides methods of directingnucleic acid fragmentation using a computer. The method includescalculating a ratio of uracil to thymidine, which ratio when used in afragmentation module produces one or more nucleic acid fragment of aselected length.

[0092] In another aspect, methods of directing PCR using a computer areprovided. The method includes calculating one or more crossover regionbetween two or more parental nucleic acid sequence using one or moreannealing temperature or extension temperature. For example, the methodoptionally includes calculating the one or more crossover region usingone or more theoretical prediction or one or more set of empirical datato calculate a melting temperature.

[0093] Methods of selecting one or more parental nucleic acids fordiversity generation using a computer are also provided. In the method,an alignment between two or more potential parental nucleic acidsequences is performed. A number of mismatches between the alignedsequences is calculated and a melting temperature for one or more windowof w bases in the alignment is calculated. One or more window of w baseshaving a melting temperature greater than x is determined and one ormore crossover segment in the alignment is identified, which one or morecrossover segment comprises two or more windows having a meltingtemperature greater than x, which two or more windows are separated byno more than n nucleotides. A dispersion of the one or more crossoversegments is calculated and a first score for each alignment based on thenumber of windows having a melting temperature grater that x, thedispersion, and the number of crossover segments identified iscalculated. A second score based on the number of mismatches, the numberof windows having a melting temperature grater that x, the dispersion,and the number of crossover segments identified is determined, and oneor more parental nucleic acid is selected based on the first scoreand/or the second score. These steps are optionally repeated, e.g.,starting with the one or more parental nucleic acid which are selected.

[0094] In this method, the alignment optionally comprises a pairwisealignment. W optionally comprises an odd number, e.g., about 21. Themethod optionally includes calculating the melting temperature for theone or more window of w bases in the alignment from one or more set ofempirical data or one or more melting temperature prediction algorithm.Example values for x include about 65° C. Example values for n includeabout 2. In the methods, the dispersion typically comprises the inverseof the average number of bases between crossover segments in thealignment.

[0095] Typically, the instruction set selects the two or more potentialparental nucleic acid sequences by searching one or more database forone or more nucleic acid sequence of interest and one or more homolog ofthe one or more nucleic acid sequence of interest.

[0096] The invention further provides embodiments in a web page, e.g.,for directing nucleic acid diversity generation, the web page comprisinga computer readable medium that causes a computer to perform any of themethods herein.

[0097] Products produced by any of the processes herein are a feature ofthe invention.

[0098] Kits embodying the methods and comprising various components ofthe device/apparatus/integrated systems herein are also provided. Use ofthe methods and/or device/systems for any of the purposes indicatedherein are also a feature of the invention.

BRIEF DESCRIPTION OF THE FIGURES

[0099]FIG. 1, Panels A and B is a schematic flow chart of an integratedsystem of the invention, beginning with input nucleic acids.

[0100]FIG. 2 provides an example schematic of the modules of anintegrated shuffling machine.

[0101]FIG. 3 provides a schematic representation of the steps performedby an exemplar shuffling module. As shown, a single pot reaction isperformed, utilizing uracil incorporation, DNA fragmentation andassembly. A rescue PCR is performed, the results assessed with PicoGreenand any wells that test positive for PicoGreen incorporation are rescuedand sent to the library quality modules.

[0102]FIG. 4 provides a schematic overview of an exemplar LibraryQuality Module.

[0103]FIG. 5 provides a schematic overview of an exemplar dilutionmodule's activities.

[0104]FIG. 6 provides a schematic overview of the activities of anexemplar expression module.

[0105]FIG. 7 provides a schematic overview of the activities of anexemplar assay module.

[0106]FIG. 8 is a schematic of an example recombination and selectionmachine.

[0107]FIG. 9, panels A-B provide a schematic illustration of variousdetection strategies using single or multiple primers (e.g., viaTaqMan).

[0108]FIG. 10 is a schematic of an example DNA shuffling machine.

[0109]FIG. 11 is a schematic of a DNA fragmentation device or module.

[0110]FIG. 12 is a schematic of a DNA fragment analysis and isolationdevice or module.

[0111]FIG. 13 is a schematic of a DNA fragment prep device.

[0112]FIG. 14 is a schematic of a precision microamplifier.

[0113]FIG. 15 is a schematic of a DNA assembly and rescue module.

[0114]FIG. 16 is a schematic of a recombination analysis module.

[0115]FIG. 17, panels A-E is a schematic of exemplar enrichment methodsfor in vitro transcription/translation.

[0116]FIG. 18 is a schematic of a high-throughput parallel SPR module.

[0117]FIG. 19 is a schematic of a shuffling chip.

[0118]FIG. 20 is a schematic of the fluidics layer of a shufflingsystem.

[0119]FIG. 21 is a schematic of an environmental control layer.

[0120]FIG. 22 is a schematic of a microscale appliance.

[0121]FIG. 23 is a schematic outline of processes for sourcing nucleicacids from diverse sources.

[0122]FIG. 24 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources,

[0123]FIG. 25 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources.

[0124]FIG. 26 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources.

[0125]FIG. 27 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources.

[0126]FIG. 28 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources.

[0127]FIG. 29 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources.

[0128]FIG. 30 is an alternative schematic outline of processes forsourcing nucleic acids from diverse sources.

[0129]FIG. 31 schematically illustrates recombination of nucleic acidstethered to a solid support.

[0130]FIGS. 32A and B schematically illustrate recovery procedures using“boomerang” and “vectorette” amplification strategies.

[0131]FIG. 33 is an illustration of the melting temperature for anucleic acid pairwise hybridization showing various crossover segments.

I. DEFINITIONS

[0132] The following definitions supplement those common in the art forthe terms specified.

[0133] A “physical array” is a set of specified elements arranged in aspecified or specifiable spatial arrangement. A “logical array” is a setof specified elements arranged in a manner which permits access to theelements of the set. A logical array can be, e.g., a virtual arrangementof the set in a computer system, or, e.g., an arrangement of setelements produced by performing a specified physical manipulation on oneor more set element or components of set elements. For example, alogical array can be described in which set elements (or components thatcan be combined to produce set elements) can be transported ormanipulated to produce the set. A “duplicate” or “copy” array is anarray which can be at least partially corresponded to a parental array.In simplest form, this correspondence takes the form of simplyreplicating all or part of the parental array, e.g., by taking analiquot of material from each position in the parental array and placingthe aliquot in a defined position in the duplicate array. However, anymethod which results in the ability to correspond members of theduplicate array to the parental array can be used for array duplication,including the use of simple or complex storage algorithms, partially orpurely in silico arrays, and pooling approaches which partially combinesome elements of the parental array into single locations (physical orvirtual) in the duplicate array. The duplicate or copy array duplicatessome or all components of a parental array. For example, an array ofreaction mixtures optionally includes nucleic acids and translation ortranscription reagents at sites in the array, while the duplicate/copyarray can also include the complete reaction mixtures, or, alternately,can include, e.g., the nucleic acids, without the other reaction mixturecomponents.

[0134] A “shuffled” nucleic acid is a nucleic acid produced by ashuffling procedure such as any shuffling procedure set forth herein.Shuffled nucleic acids are produced by recombining (physically orvirtually) two or more nucleic acids (or character strings), e.g., in anartificial, and optionally recursive, fashion. Generally, one or morescreening steps are used in shuffling processes to identify nucleicacids of interest; this screening step can performed before or after anyrecombination step. In some (but not all) shuffling embodiments, it isdesirable to perform multiple rounds of recombination prior to selectionto increase the diversity of the pool to be screened. The overallprocess of recombination and selection are optionally repeatedrecursively. Depending on context, shuffling can refer to an overallprocess of recombination and selection, or, alternately, can simplyrefer to the recombinational portions of the overall process.

[0135] A “mutagenized nucleic acid” is a nucleic acid which has beenphysically altered as compared to a parental nucleic acid (e.g., such asa naturally occurring nucleic acid), e.g., by modifying, deleting,rearranging, or replacing one or more nucleotide residue in themutagenized nucleic acid as compared to the parental nucleic acid.

[0136] A “transcribed” nucleic acid is a nucleic acid produced bycopying a parental nucleic acid, where the parental nucleic acid is adifferent nucleic acid type than the copied nucleic acid. For example,an RNA copy of a DNA molecule (e.g., as occurs during classicaltranscription) or a DNA copy of an RNA molecule (e.g., as occurs duringclassical reverse transcription) can be a “transcribed nucleic acid” asthat term is intended herein. Similarly, artificial nucleic acids,including peptide nucleic acids, can be used as either the parental orthe copied nucleic acid (and artificial nucleotides can be incorporatedinto either parental or copied molecules). Copying can be performed,e.g., using appropriate polymerases, or using in vitro artificialchemical synthetic methods, or a combination of synthetic and enzymaticmethods.

[0137] An “in vitro translation reagent” is a reagent which is necessaryor sufficient for in vitro translation, or a reagent which modulates therate or extent of an in vitro translation reaction, or which alters theparameters under which the reaction is operative. Examples includeribosomes, and reagents which include ribosomes, such as reticulocytelysates, bacterial cell lysates, cellular fractions thereof, aminoacids, t-RNAs, etc.

[0138] A “translation product” is a product (typically a polypeptide)produced as a result of the translation of a nucleic acid. A“transcription product” is a product (e.g., an RNA, optionally includingmRNA, or, e.g., a catalytic or biologically active RNA) produced as aresult of transcription of a nucleic acid.

[0139] A “solid phase array” is an array in which the members of thearray are fixed to or within a solid or semi-solid substrate. Thefixation can be the result of any interaction that tends to immobilizecomponents, including chemical linking, heat treatment, hybridization,ligand/receptor interactions, metal chelation interactions, ionexchange, hydrogen bonding and hydrophobic interactions and the like.For semi-solid substrates such as gels and gel droplets, linking mayrequire nothing more than mixing of the member with the substratematerial during or after solidification. A “solid substrate” has a fixedorganizational support matrix, such as silica, glass, polymericmaterials, membranes, filters, beads, pins, slides, microtiter plates ortrays, etc. In some embodiments, at least one surface of the substrateis partially planar, but in others, the solid substrate is a discreteelement such as a bead which can be dispensed into an organizationmatrix such as a microtiter tray. Solid support materials include, butare not limited to, glass, polacryloylmorpholide, silica, controlledpore glass (CPG), polystyrene, polystyrene/latex, polyacyrlate,polyacrylamide, agar, agarose, chemically modified agars and agaroses,carboxyl modified teflon, nylon and nitrocellulose. The solid substratescan be biological, nonbiological, organic, inorganic, or a combinationof any of these, existing as particles, strands, precipitates, gels,sheets, tubing, spheres, containers, capillaries, pads, slices, films,plates, slides, etc., depending upon the particular application. Othersuitable solid substrate materials will be readily apparent to those ofskill in the art. Often, the surface of the solid substrate will containreactive groups, such as carboxyl, amino, hydroxyl, thiol, or the likefor the attachment of nucleic acids, proteins, etc. Surfaces on thesolid substrate will sometimes, though not always, be composed of thesame material as the substrate. Thus, the surface may be composed of anyof a wide variety of materials, for example, polymers, plastics, resins,polysaccharides, silica or silica-based materials, carbon, metals,inorganic glasses, membranes, or any of the above-listed substratematerials. The surface may also be chemically modified or functionalizedin such a way as to allow it to establish binding interactions withfunctional groups intrinsic to or specifically associated with thenucleic acids or polypeptides to be immobilized.

[0140] A “liquid phase array” is an array in which the members of thearray are free in solution, e.g., on a microtiter tray, or in a seriesof containers such as a set of test tubes or other containers. Mostoften, members of a liquid phase array are separated in space bysubdividing the volume containing the members of the array into multiplediscrete chambers such that each chamber contains less than a completelibrary of members, and ideally less than about 10% of the discretemembers in the library. Such separation or fractionation of a populationcontaining a plurality of unique sequences can be accomplished bysorting, dilution, serial dilution, and a variety of other methods.

[0141] Nucleic acids are “homologous” when they derive (artificially ornaturally) from a common ancestor. Where there is no direct knowledge ofthe relatedness of two or more nucleic acids, homology is often inferredby consideration of the percent identity or by identification ofdiscrete sequence motifs within sets of low identity sequences of therelevant nucleic acids. As described in more detail herein, commonlyavailable software programs such as BLAST and PILEUP can be used tocalculate relatedness of nucleic acids.

[0142] Nucleic acids “hybridize” when they preferentially associate insolution. As described in more detail below, a variety of parameterssuch as temperature, ionic buffer conditions and the presence or absenceof organic solvents affect hybridization of two or more nucleic acids.

[0143] A “translation control sequence” is a nucleic acid subsequencewhich affects the initiation, rate or extent of translation of a nucleicacid, such as ribosome binding sites, stop codons and the like. Avariety of such sequences are known and described in the references setforth herein and many more are fully available to one of skill.

[0144] A “transcription control sequence” is a nucleic acid subsequencewhich affects the initiation, rate or extent of transcription of anucleic acid, such as a promoter, enhancer or terminator sequences. Avariety of such sequences are known and described in the references setforth herein, and many more are fully available to one of skill.

DETAILED DISCUSSION OF THE INVENTION

[0145] The present invention takes advantage of a variety oftechnologies to automate nucleic acid shuffling and otherdiversity-generation dependent processes. Each aspect of diversitygeneration and downstream screening processes can be automated (and usedindividually in separate modules or collectively in an integrated systemor an overall device), providing devices, systems and methods whichgreatly increase throughput for generating diverse nucleic acids (e.g.,by recombination methods such as DNA shuffling, or via other mutagenesismethods, or combinations thereof) and screening for desirable propertiesof those nucleic acids (e.g., encoded RNAs, proteins, or the like).

[0146] The invention provides, among other things, methods, kits,devices and integrated systems. For example, devices and integratedsystems comprising a physical or logical array of reaction mixtures areprovided. Each reaction mixture comprises one or more recombinant,shuffled or otherwise diversified nucleic acids (e.g., diversified bymutagenesis, optionally including recombination or other methods), orcorresponding transcribed nucleic acids (e.g., cDNAs or mRNAs). Thereaction mixtures of the array also include one or more in vitrotranscription and/or translation reagents.

[0147] As will be described in more detail below, arrays can be, andcommonly are, partially or completely duplicated in the methods andsystems of the invention. For example, aliquots of reaction mixtures orproducts can be taken and copy arrays formed from the aliquots.Similarly, master arrays comprising, e.g., the nucleic acids found inthe reaction mixtures (e.g., arrays constituted of duplicate amplifiedsets of diversified nucleic acids) can be produced. The precise mannerof production of array copies varies according to the physical nature ofthe array. For example, where arrays are formed in microtiter trays,copy arrays are conveniently formed in microtiter trays, e.g., byautomated pipetting of aliquots of material from an original array.However, arrays can also change form in the copying process, i.e.,liquid phase copies can be formed from solid phase arrays, or viceversa, or a logical array can be converted to a simple or complexspatial array in the process of forming the copy (e.g., by moving orcreating an aliquot of material corresponding to a member of the logicalarray, and, subsequently, placing the aliquot with other array membersin an accessible spatial relationship such as a gridded array), or viceversa (e.g., array member positions can be recorded and that informationused as the basis for logical arrays that constitute members of multiplespatial arrays—a common process when identifying “hits” having anactivity of interest).

[0148] The arrays can include both reaction mixture and productcomponents. For example, in addition to the nucleic acids, transcriptionregents and translation reagents noted above, the arrays can alsoinclude products of the reaction mixture such as RNAs (e.g., mRNAs,biologically active nucleic acids (e.g., ribozymes, aptamers, antisensemolecules, etc.) proteins, or the like. Thus, the reaction mixtures cancomprise one or more translation products or one or more transcriptionproducts, or both.

[0149] Similarly, the arrays can have any of a variety of physicalconfigurations, including solid or liquid phase(s). Some or all of thecomponents of the reaction mixtures can be fixed in position, e.g., thenucleic acids in the reaction mixtures can be relatively fixed inposition (e.g., in a solid or immobilized phase), while the othercomponents of the array can diffuse across the array (e.g., through agel or other immobilizing matrix). Alternatively, some or all of themembers of the array can be immobilized to a single general spatiallocation (e.g., by being present in wells of a microtiter dish, eitherby being fixed to the surface of the dish or in solution in the wells ofthe dish). Thus, the array of reaction mixtures can comprises a solidphase or a liquid phase array of any of the components of the reactionmixtures, e.g., the diversified nucleic acids (or transcribed productsthereof), in vitro translation reagents, etc.

[0150] I. An Overview of Integrated Deversity Generation/ScreeningSystems

[0151]FIG. 1, panels A and B provides a schematic overview of an exampleintegrated system of the invention. In some contexts, some of the listedelements are omitted; conversely, many additional elements areoptionally included.

[0152] As shown, nucleic acids (DNA, RNA, etc.) or correspondingcharacter strings (e.g., characters in a computer system) are input intothe system. A diversity generation module (e.g., a shuffling and/ormutagenesis module) recombines, mutagenizes or otherwise modifies theinput nucleic acids to produce a diverse set of nucleic acids that areused to produce one or more product (a protein, bioactive RNA, or thelike) in a product production module. Variant nucleic acids are thenselected (typically by screening products from the production module)for a desired encoded activity (encoded protein or RNA, level of RNAexpression, level of protein expression, etc.). Top variants are thenselected for further characterization, additional rounds of diversitygeneration (e.g., recombination of the top variants with each other orwith additional nucleic acids, or both).

[0153] Typically, a product quantification module can be used tonormalize selection results (i.e., to account for differences inconcentrations of protein, catalytic RNAs or other products).Optionally, one or more additional secondary assay can be performed tofurther select for one or more additional property of interest in anyproduct.

[0154]FIG. 1, panel B provides additional details of the exampleintegrated system. As shown, nucleic acids are dispensed from diversitygeneration module 1 into microtiter trays (as described below, manyalternative configurations that do not use such trays, instead usingother liquid (e.g., microfluidic) or solid phase arrays). For example,the diversified DNAs (or other nucleic acids) are dispensed into firsttray or set of trays 10 at about 0-100 unique DNA molecules/well toprovide for straightforward interpretation of results from the system.Commonly, each well can contain 0-10 unique molecules. For example, eachwell can contain, on average, 0-5, or e.g., 0-3 unique molecules. Thatis, if there are only 1 or a few nucleic acid molecule member types perarray position it is easier to identify which array members produce adesirable activity. However, arrays of pooled members can be used, inwhich pools having an activity of interest are subsequently deconvoluted(e.g., re-arrayed by limiting dilution and the pool members tested forany activity of interest). In this context, the term “unique” refers tonucleic acids of differing lengths or sequences.

[0155] A nucleic acid master array is produced by amplifying the membersof the first tray (the amplified members are accessible for furtheroperations), e.g., as indicated by PCR process amplification step(s) 15.One or more copies of this master array (20, 21) is optionally produced(e.g., by aliquotting or otherwise transferring materials from theoriginal to the copies) for further access by the system in subsequentprocedures. Either the original or the duplicate of the master array canbe in vitro transcribed (if appropriate—the copying procedure(represented by in vitro transcription process step 25) can produce DNAor RNA copies (e.g., as represented by mRNA copy array 30), and theoriginal can be DNA or RNA, as desired) and/or translated in vitro toproduce a product of interest (e.g., a biologically active RNA, protein,or the like, represented by protein/RNA array 40). This is representedby in vitro transcription process step(s) 35.

[0156] The product is assayed as appropriate on primary assay plate 50which optionally includes substrates or other relevant components.Secondary assays (i.e., assays for activities which differ from thefirst activity) can also be run in secondary assay modules.

[0157] Typically, a product quantification module such as a proteinquantification/purification module 60 is used to normalize the activitylevel of the product, i.e., to detect and/or account for variation inproduct concentrations. Protein quantitation module 60 allows arrayingat uniform concentration for specific activities. Aliquots of existingproteins can be rearrayed and reassayed, e.g., on secondary assay plate70. New protein can be reproduced from mRNA or dsDNA, quantified andreassayed.

[0158] Detector elements are typically included in protein quantitationmodule 60 to detect product activities of interest (hits). Optionally,hit picking software and or hardware is used to select hits (othersoftware elements control sample manipulation and transfer betweenmodules and respond to user inputs). The system determines which nucleicacids in the master array that the hits correspond to and eitheridentifies the hits to the user or uses corresponding nucleic acids fromthe original or copy master array in subsequent diversity generationreactions, such as in additional shuffling reactions in the diversitygeneration module.

[0159] In general in FIG. 1, arrows between plates indicate processesthat can be used to produce new plates, or which can be performed onexisting plates.

[0160] II. Methods and System Elements for Generating Nucleic AcidDiversity

[0161] A variety of diversity generating protocols (e.g., mutation,including recombination and other methods) are available and describedin the art. The procedures can be used separately, and/or in combinationto produce one or more variants of a nucleic acid or set of nucleicacids, as well variants of encoded proteins. Individually andcollectively, these procedures provide robust, widely applicable ways ofgenerating diversified nucleic acids and sets of nucleic acids(including, e.g., nucleic acid libraries) useful, e.g., for theengineering or rapid evolution of nucleic acids, proteins, pathways,cells and/or organisms with new and/or improved characteristics.

[0162] While distinctions and classifications are made in the course ofthe ensuing discussion for clarity, it will be appreciated that thetechniques are often not mutually exclusive. Indeed, the various methodscan be used singly or in combination, in parallel or in series, toprovide diverse sequence variants.

[0163] The result of any of the diversity generating proceduresdescribed herein can be the generation of one or more nucleic acids,which can be selected or screened for nucleic acids that encode proteinsor bioactive RNAs (e.g., catalytic RNAs) with or which confer new ordesirable properties. Following diversification by one or more of themethods herein, or otherwise available to one of skill, any nucleicacids that are produced can be selected for a desired activity orproperty, e.g. for use in the automated systems and methods herein. Thiscan include identifying any activity that can be detected, for example,in an automated or automatable format, by any of the assays in the artor herein. A variety of related (or even unrelated) properties can beevaluated, in serial or in parallel, at the discretion of thepractitioner.

[0164] As noted, a variety of diversity generating/product screeningreactions can be automated by the methods set forth herein. Oneimportant class of such reactions are “nucleic acid shuffling” or “DNAshuffling” methods. In these methods, any of a variety ofrecombination-based diversity generating procedures can be used todiversify starting nucleic acids, or organisms comprising nucleic acids,or even to diversify character strings which are “in silico” (incomputer) representations of nucleic acids (or both). Diverse nucleicacids/character strings/organisms which are generated by such methodsare typically screened for one or more activity. Nucleic acids,character strings, or organisms which comprise nucleic acids are thenoptionally used as substrates in subsequent recombination reactions, theproducts of which are, again, screened for one or more activity. Thisprocess is optionally repeated recursively until one or more desirableproduct is produced.

[0165] A variety of diversity generating protocols, including nucleicacid shuffling protocols, are available and fully described in the art.The following publications describe a variety of recursive recombinationand other mutational procedures and/or methods which can be incorporatedinto such procedures, as well as other diversity generating protocols:Soong, N. et al. (2000) “Molecular breeding of viruses” Nat Genet 25(4):436-439; Stemmer, et al., (1999) “Molecular breeding of viruses fortargeting and other clinical properties. Tumor Targeting” 4: 1-4; Nessetal. (1999) “DNA Shuffling of subgenomic sequences of subtilisin” NatureBiotechnology 17: 893-896; Chang et al. (1999) “Evolution of a cytokineusing DNA family shuffling” Nature Biotechnology 17: 793-797; Minshulland Stemmer (1999) “Protein evolution by molecular breeding” CurrentOpinion in Chemical Biology 3: 284-290; Christians et al. (1999)“Directed evolution of thymidine kinase for AZT phosphorylation usingDNA family shuffling” Nature Bietechnology 17: 259-264; Crameriet al.(1998) “DNA shuffling of a family of genes from diverse speciesaccelerates directed evolution” Nature 391: 288-291; Crameri et al.(1997) “Molecular evolution of an arsenate detoxification pathway by DNAshuffling,” Nature Biotechnology 15: 436-438; Zhang et al. (1997)“Directed evolution of an effective fucosidase from a galactosidase byDNA shuffling and screening” Proceedings of the National Academy ofSciences, U.S.A. 94: 4504-4509; Patten et al. (1997) “Applications ofDNA Shuffling to Pharmaceuticals and Vaccines” Current Opinion inBiotechnology 8: 724-733; Crameri et al. (1996) “Construction andevolution of antibody-phage libraries by DNA shuffling” Nature Medicine2: 100-103; Crameri et al. (1996) “Improved green fluorescent protein bymolecular evolution using DNA shuffling” Nature Biotechnology 14:315-319; Gates et al. (1996) “Affinity selective isolation of ligandsfrom peptide libraries through display on a lac repressor ‘headpiecedimer’” Journal of Molecular Biology 255: 373-386; Stemmer (1996)“Sexual PCR and Assembly PCR” In: The Encyclopedia of Molecular Biology.VCH Publishers, New York. pp.447-457; Crameri and Stemmer (1995)“Combinatorial multiple cassette mutagenesis creates all thepermutations of mutant and wildtype cassettes” BioTechniques 18:194-195; Stemmer al., (1995) “Single-step assembly of a gene and entireplasmid form large numbers of oligodeoxyribonucleotides” Gene, 164:49-53; Stemmer (1995) “The Evolution of Molecular Computation” Science270: 1510; Stemmer (1995) “Searching Sequence Space” Bio/Technology 13:549-553; Stemmer (1994) “Rapid evolution of a protein in vitro by DNAshuffling” Nature 370: 389-391; and Stemmer (1994) “DNA shuffling byrandom fragmentation and reassembly: In vitro recombination formolecular evolution.” Proceedings of the National Academy of Sciences,U.S.A. 91: 10747-10751.

[0166] Additional available mutational methods of generating diversityinclude, for example, site-directed mutagenesis (Ling et al. (1997)“Approaches to DNA mutagenesis: an overview” Anal Biochem. 254(2):157-178; Dale et al. (1996) “Oligonucleotide-directed random mutagenesisusing the phosphorothioate method” Methods Mol. Biol. 57: 369-374; Smith(1985) “In vitro mutagenesis” Ann. Rev. Genet. 19: 423-462; Botstein &Shortle (1985) “Strategies and applications of in vitro mutagenesis”Science 229: 1193-1201; Carter (1986) “Site-directed mutagenesis”Biochem. J. 237: 1-7; and Kunkel (1987) “The efficiency ofoligonucleotide directed mutagenesis” in Nucleic Acids & MolecularBiology (Eckstein, F. and Lilley, D. M. J. eds., Springer Verlag,Berlin)); mutagenesis using uracil containing templates (Kunkel (1985)“Rapid and efficient site-specific mutagenesis without phenotypicselection” Proc. Natl. Acad. Sci. USA 82: 488-492; Kunkel et al. (1987)“Rapid and efficient site-specific mutagenesis without phenotypicselection” Methods in Enzymol. 154, 367-382; and Bass et al. (1988)“Mutant Trp repressors with new DNA-binding specificities” Science 242:240-245); oligonucleotide-directed mutagenesis (Methods in Enzymol. 100:468-500 (1983); Methods in Enzymol. 154: 329-350 (1987); Zoller & Smith(1982) “Oligonucleotide-directed mutagenesis using M13-derived vectors:an efficient and general procedure for the production of point mutationsin any DNA fragment” Nucleic Acids Res. 10: 6487-6500; Zoller & Smith(1983) “Oligonucleotide-directed mutagenesis of DNA fragments clonedinto M13 vectors” Methods in Enzymol. 100: 468-500; and Zoller & Smith(1987) “Oligonucleotide-directed mutagenesis: a simple method using twooligonucleotide primers and a single-stranded DNA template” Methods inEnzymol. 154: 329-350); phosphorothioate-modified DNA mutagenesis(Taylor et al. (1985) “The use of phosphorothioate-modified DNA inrestriction enzyme reactions to prepare nicked DNA” Nucl. Acids Res. 13:8749-8764; Taylor et al. (1985) “The rapid generation ofoligonucleotide-directed mutations at high frequency usingphosphorothioate-modified DNA” Nucl. Acids Res. 13: 8765-8787 (1985);Nakamaye & Eckstein (1986) “Inhibition of restriction endonuclease Nci Icleavage by phosphorothioate groups and its application tooligonucleotide-directed mutagenesis” Nucl. Acids Res. 14: 9679-9698;Sayers et al. (1988) “Y-T Exonucleases in phosphorothioate-basedoligonucleotide-directed mutagenesis” Nucl. Acids Res. 16:791-802; andSayers et al. (1988) “Strand specific cleavage ofphosphorothioate-containing DNA by reaction with restrictionendonucleases in the presence of ethidium bromide” Nucl. Acids Res. 16:803-814); mutagenesis using gapped duplex DNA (Kramer et al. (1984) “Thegapped duplex DNA approach to oligonucleotide-directed mutationconstruction” Nucl. Acids Res. 12: 9441-9456; Kramer & Fritz (1987)Methods in Enzymol. “Oligonucleotide-directed construction of mutationsvia gapped duplex DNA” 154: 350-367; Kramer et al. (1988) “Improvedenzymatic in vitro reactions in the gapped duplex DNA approach tooligonucleotide-directed construction of mutations” Nucl. Acids Res. 16:7207; and Fritz et al. (1988) “Oligonucleotide-directed construction ofmutations: a gapped duplex DNA procedure without enzymatic reactions invitro” Nucl. Acids Res. 16: 6987-6999).

[0167] Additional suitable methods include point mismatch repair (Krameret al. (1984) “Point Mismatch Repair” Cell 38: 879-887), mutagenesisusing repair-deficient host strains (Carter et al. (1985) “Improvedoligonucleotide site-directed mutagenesis using M13 vectors” Nucl. AcidsRes. 13: 4431-4443; and Carter (1987) “Improved oligonucleotide-directedmutagenesis using M13 vectors” Methods in Enzymol. 154: 382-403),deletion mutagenesis (Eghtedarzadeh & Henikoff (1986) “Use ofoligonucleotides to generate large deletions” Nucl. Acids Res. 14:5115), restriction-selection and restriction-purification (Wells et al.(1986) “Importance of hydrogen-bond formation in stabilizing thetransition state of subtilisin” Phil. Trans. R. Soc. Lond. A 317:415-423), mutagenesis by total gene synthesis (Nambiar et al. (1984)“Total synthesis and cloning of a gene coding for the ribonuclease Sprotein” Science 223: 1299-1301; Sakamar and Khorana (1988) “Totalsynthesis and expression of a gene for the a-subunit of bovine rod outersegment guanine nucleotide-binding protein (transducin)” Nucl. AcidsRes. 14: 6361-6372; Wells et al. (1985) “Cassette mutagenesis: anefficient method for generation of multiple mutations at defined sites”Gene 34: 315-323; and Grundström et al. (1985) “Oligonucleotide-directedmutagenesis by microscale ‘shot-gun’ gene synthesis” Nucl. Acids Res.13: 3305-3316), double-strand break repair (Mandecki (1986)“Oligonucleotide-directed double-strand break repair in plasmids ofEschericdlia coli: a method for site-specific mutagenesis” Proc. Natl.Acad. Sci. USA, 83: 7177-7181; and Arnold (1993) “Protein engineeringfor unusual environments” Current Opinion in Biotechnology 4: 450-455).Additional details on many of the above methods can be found in Methodsin Enzymology Volume 154, which also describes useful controls fortrouble-shooting problems with various mutagenesis methods.

[0168] Additional details regarding DNA shuffling and other diversitygenerating methods are found in U.S. Patents by the inventors and theirco-workers, including: U.S. Pat. No. 5,605,793 to Stemmer (Feb. 25,1997), “METHODS FOR IN VITRO RECOMBINATION;” U.S. Pat. No. 5,811,238 toStemmer et al. (Sep. 22, 1998) “METHODS FOR GENERATING POLYNUCLEOTIDESHAVING DESIRED CHARACTERISTICS BY ITERATIVE SELECTION ANDRECOMBINATION;” U.S. Pat. No. 5,830,721 to Stemmer et al. (Nov. 3,1998), “DNA MUTAGENESIS BY RANDOM FRAGMENTATION AND REASSEMBLY;” U.S.Pat. No. 5,834,252 to Stemmer, et al. (Nov. 10, 1998) “END-COMPLEMENTARYPOLYMERASE REACTION,” and U.S. Pat. No. 5,837,458 to Minshull, et al.(Nov. 17, 1998), “METHODS AND COMPOSITIONS FOR CELLULAR AND METABOLICENGINEERING.”

[0169] In addition, details and formats for recursive recombination,e.g., DNA shuffling and other diversity generating protocols are foundin a variety of PCT and foreign patent application publications,including: Stemmer and Crameri, “DNA MUTAGENESIS BY RANDOM FRAGMENTATIONAND REASEMBLY” WO 95/22625; Stemmer and Lipschutz “END COMPLEMENTARYPOLYMERASE CHAIN REACTION” WO 96/33207; Stemmer and Crameri. “METHODSFOR GENERATING POLYNUCLEOTIDES HAVING DESIRED CHARACTERISTICS BYITERATIVE SELECTION AND RECOMBINATION” WO 97/0078; Minshul and Stemmer,“METHODS AND COMPOSITIONS FOR CELLULAR AND METABOLIC ENGINEERING” WO97/35966; Punnonen et al. “TARGETING OF GENETIC VACCINE VECTORS” WO99/41402; Punnonen et al. “ANTIGEN LIBRARY IMMUNIZATION” WO 99/41383;Punnonen et al. “GENETIC VACCINE VECTOR ENGINEERING” WO 99/41369;Punnonen et al. OPTIMIZATION OF IMMUNOMODULATORY PROPERTIES OF GENETICVACCINES WO 9941368; Stemmer and Crameri, “DNA MUTAGENESIS BY RANDOMFRAGMENTATION AND REASSEMBLY” EP 0934999; Stemmer “EVOLVING CELLULAR DNAUPTAKE BY RECURSIVE SEQUENCE RECOMBINATION” EP 0932670; Stemmer et al.,“MODIFICATION OF VIRUS TROPISM AND HOST RANGE BY VIRAL GENOME SHUFFLING”WO 9923107; Apt et al., “HUMAN PAPILLOMAVIRUS VECTORS” WO 9921979; DelCardayre et al. “EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVESEQUENCE RECOMBINATION” WO 9831837; Patten and Stemmer, “METHODS ANDCOMPOSITIONS FOR POLYPEPTIDE ENGINEERING” WO 9827230; Stemmer et al.,and “METHODS FOR OPTIMIZATION OF GENE THERAPY BY RECURSIVE SEQUENCESHUFFLING AND SELECTION” W09813487.

[0170] Certain U.S. applications provide additional details regardingvarious diversity generating methods, including “SHUFFLING OF CODONALTERED GENES” by Patten et al. filed Sep. 28, 1999, (U.S. Ser. No.09/407,800); “EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVESEQUENCE RECOMBINATION”, by del Cardayre et al. filed Jul. 15, 1998(U.S. Ser. No. 09/166,188), and Jul. 15, 1999 (U.S. Ser. No.09/354,922); “OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” byCrameri et al., filed Sep. 28, 1999 (U.S. Ser. No. 09/408,392), and“OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Crameri et al.,filed Jan. 18, 2000 (PCT/US00/01203); “USE OF CODON-VARIEDOLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHTTFFLING” by Welch et al.,filed Sep. 28, 1999 (U.S. Ser. No. 09/408,393); “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., filed Jan. 18, 2000,(PCT/US00/01202) and, e.g., “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., filed Jul. 18, 2000 (U.S. Ser. No. 09/618,579);“METHODS OF POPULATING DATA STRUCTURES FOR USE IN EVOLUTIONARYSIMULATIONS” by Selifonov and Stemmer, filed Jan. 18, 2000(PCT/US00/01138); and “SINGLE-STRANDED NUCLEIC ACID TEMPLATE-MEDIATEDRECOMBINATION AND NUCLEIC ACID FRAGMENT ISOLATION” by Affholter, filedSept. 6, 2000 (U.S. Ser. No. 09/656,549).

[0171] As review of the foregoing publications, patents, publishedapplications and U.S. patent applications reveals, recursiverecombination and other mutation methods for modifying nucleic acids toprovide new nucleic acids with desired (e.g., new or improved)properties can be carried out by a number of established methods andthese procedures can be combined with any of a variety of otherdiversity generating methods. The following exemplify some of thedifferent formats for diversity generation in the context of the presentinvention, including, e.g., certain recombination based diversitygeneration formats. Many additional formats are provided in thereferences above and herein, and can be adapted to use in the systemsand methods herein.

[0172] For example, several different general classes of recombinationmethods are applicable to the present invention and set forth in thereferences above. First, nucleic acids can be recombined in vitro by anyof a variety of techniques discussed in the references above, includinge.g., DNAse digestion of nucleic acids to be recombined followed byligation and/or PCR reassembly of the nucleic acids. Second, nucleicacids can be recursively recombined in vivo, e.g., by allowingrecombination to occur between nucleic acids in cells. Third, wholegenome recombination methods can be used in which whole genomes of cellsor other organisms are recombined, optionally including spiking of thegenomic recombination mixtures with desired library components. Fourth,synthetic recombination methods can be used, in which oligonucleotidescorresponding to targets of interest are synthesized and reassembled inPCR or ligation reactions which include oligonucleotides whichcorrespond to more than one parental nucleic acid, thereby generatingnew recombined nucleic acids. Oligonucleotides can be made by standard,single nucleotide addition methods, or by methods in whichdinucleotides, trinucleotides or longer oligomers are added in at leastone synthetic cycle, for example, to limit or expand the number ofcodons which may be present at a given position within a synthetic orsemi-synthetic gene. Moreover, recombined nucleic acids may be generatedeither from a starting pool of single stranded oligonucleotides or byfirst annealing at least one single-stranded oligomer to a complementsequence, thus forming a starting pool of preannealed double strandedoligonucleotides. Fifth, in silico methods of recombination can beeffected in which genetic algorithms are used in a computer to recombinesequence strings which correspond to nucleic acid homologues (or evennon-homologous sequences). The resulting recombined sequence strings areoptionally converted into nucleic acids by synthesis of nucleic acidswhich correspond to the recombined sequences, e.g., in concert witholigonucleotide synthesis/gene reassembly techniques. Sixth, methods ofaccessing natural diversity, e.g., by hybridization of diverse nucleicacids or nucleic acid fragments to single-stranded templates, followedby polymerization and/or ligation to regenerate full-length sequences,optionally followed by degradation of the templates and recovery of theresulting modified nucleic acids can be used. Any of the precedinggeneral recombination formats can be practiced in a reiterative fashionto generate a more diverse set of recombinant nucleic acids.

[0173] Thus, as noted, nucleic acids can be recombined in vitro by anyof a variety of techniques discussed in the references above, includinge.g., DNAse digestion of nucleic acids to be recombined followed byligation and/or PCR reassembly of the nucleic acids. For example, sexualPCR mutagenesis can be used in which random (or pseudo random, or evennon-random) fragmentation of the DNA molecule is followed byrecombination, based on sequence similarity, between DNA molecules withdifferent but related DNA sequences, in vitro, followed by fixation ofthe crossover by extension in a polymerase chain reaction. This processand many process variants are described in several of the referencesabove, e.g., in Stemmer (1994) Proc. Natl. Acad. Sci. USA 91:10747-10751. The present invention provides various automated formatsand related devices for practicing such methods.

[0174] Similarly, nucleic acids can be recursively recombined in vivo,e.g., by allowing recombination to occur between nucleic acids in cells.Many such in vivo recombination formats are set forth in the referencesnoted above. Such formats optionally provide direct recombinationbetween nucleic acids of interest, or provide recombination betweenvectors, viruses, plasmids, etc., comprising the nucleic acids ofinterest, as well as other formats. Details regarding such proceduresare found in the references noted above. Here again, the presentinvention provides various automated formats and related devices forpracticing such methods.

[0175] In addition, whole genome recombination methods can also be usedin which whole genomes of cells or other organisms are recombined,optionally including spiking of the genomic recombination mixtures withdesired library components (e.g., genes corresponding to the pathways ofthe present invention). These methods have many applications, includingthose in which the identity of a target gene is not known. Details onsuch methods are found, e.g., in WO 98/31837 by del Cardayre et al.“Evolution of Whole Cells and Organisms by Recursive SequenceRecombination;” and in, e.g., PCT/US99/15972 by del Cardayre et al.,also entitled “Evolution of Whole Cells and Organisms by RecursiveSequence Recombination.” The present invention provides variousautomated formats and related devices for practicing such methods.

[0176] As noted, synthetic recombination methods can also be used, inwhich oligonucleotides corresponding to targets of interest aresynthesized and reassembled in PCR or ligation reactions which includeoligonucleotides which correspond to more than one parental nucleicacid, thereby generating new recombined nucleic acids. Oligonucleotidescan be made by standard nucleotide addition methods, or can be made,e.g., by tri-nucleotide or other synthetic approaches. Details regardingsuch approaches are found in the references noted above, including,e.g., “OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Crameriet al., filed Sep. 28, 1999 (U.S. Ser. No. 09/408,392), and“OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Cramerir etal., filed Jan. 18, 2000 (PCT/US00/01203); “USE OF CODON-VARIEDOLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING” by Welch et al.,filed Sep. 28, 1999 (U.S. Ser. No. 09/408,393); “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., filed Jan. 18, 2000,(PCT/US00/01202); “METHODS OF POPULATING DATA STRUCTURES FOR USE INEVOLUTIONARY SIMULATIONS” by Selifonov and Stemmer (PCT/US00/01138),filed Jan. 18, 2000; and, e.g., “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., filed Jul. 18, 2000 (U.S. Ser. No. 09/618,579). Theseprocedures are especially amenable to use in the automated systems andmethods herein.

[0177] For example, in silico methods of recombination can be effectedin which genetic algorithms (GAs) or genetic operators (GOs) are used ina computer to recombine sequence strings which correspond to homologous(or even non-homologous) nucleic acids. The resulting recombinedsequence strings are optionally converted into nucleic acids bysynthesis of nucleic acids which correspond to the recombined sequences,e.g., in concert with oligonucleotide synthesis/gene reassemblytechniques. This approach can generate random, partially random ordesigned variants. Many details regarding in silico recombination,including the use of genetic algorithms, genetic operators and the likein computer systems, combined with generation of corresponding nucleicacids (and/or proteins), as well as combinations of designed nucleicacids and/or proteins (e.g., based on cross-over site selection) as wellas designed, pseudo-random or random recombination methods are describedin “METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDESHAVING DESIRED CHARACTERISTICS” by Selifonov et al., filed Jan. 18,2000, (PCT/US00/01202) “METHODS OF POPULATING DATA STRUCTURES FOR USE INEVOLUTIONARY SIMULATIONS” by Selifonov and Stemmer (PCT/US00/01138),filed Jan. 18, 2000; and, e.g., “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., filed Jul. 18, 2000 (U.S. Ser. No. 09/618,579).Extensive details regarding in silico recombination methods are found inthese applications.

[0178] Many methods of accessing natural diversity, e.g., byhybridization of diverse nucleic acids or nucleic acid fragments tosingle-stranded templates, followed by polymerization and/or ligation toregenerate full-length sequences, optionally followed by degradation ofthe templates and recovery of the resulting modified nucleic acids canbe similarly used. In one method employing a single-stranded template,the fragment population derived from the genomic library(ies) isannealed with partial, or, often approximately full length, ssDNA or RNAcorresponding to the opposite strand. Assembly of complex chimeric genesfrom this population is then mediated by nuclease-base removal ofnon-hybridizing fragment ends, polymerization to fill gaps between suchfragments and subsequent single stranded ligation. The parentalpolynucleotide strand can be removed by digestion (e.g., if RNA oruracil-containing), magnetic separation under denaturing conditions (iflabeled in a manner conducive to such separation) and other availableseparation/purification methods. Alternatively, the parental strand isoptionally co-purified with the chimeric strands and removed duringsubsequent screening and processing steps. Additional details regardingthis approach are found, e.g., in “SINGLE-STRANDED NUCLEIC ACIDTEMPLATE-MEDIATED RECOMBINATION AND NUCLEIC ACID FRAGMENT ISOLATION” byAffholter, U.S. Ser. No. 09/656,549, filed Sept. 6, 2000. Furtherdetails on adaptation of these methods to the present invention arefound supra.

[0179] In another approach, single-stranded molecules are converted todouble-stranded DNA (dsDNA) and the dsDNA molecules are bound to a solidsupport by ligand-mediated binding. After separation of unbound DNA, theselected DNA molecules are released from the support and introduced intoa suitable host cell to generate a library enriched sequences whichhybridize to the probe. A library produced in this manner provides adesirable substrate for further diversification using any of theprocedures described herein. Further details on this approach areprovided herein.

[0180] Any of the preceding general mutation or recombination formatscan be practiced in a reiterative fashion (e.g., one or more cycles ofmutation/recombination or other diversity generation methods, optionallyfollowed by one or more selection methods) to generate a more diverseset of recombinant nucleic acids.

[0181] In general, the above references provide many basic mutation andrecombination formats as well as many modifications of these formats.Regardless of the format which is used, the nucleic acids of theinvention can be recombined (with each other or with related (or evenunrelated) to produce a diverse set of recombinant nucleic acids,including, e.g., sets of homologous nucleic acids.

[0182] Following recombination and/or other forms of mutation, anynucleic acids which are produced can be selected for a desired activity.In the context of the present invention, this can include testing forand identifying any activity that can be detected in an automatableformat, by any of the assays in the art. A variety of related (or evenunrelated) properties can be assayed for, using any available assay.These methods are automated according to the present invention asdescribed herein. As noted, DNA recombination and other forms ofmutagenesis, separately or in combination, provide robust, widelyapplicable, means of generating diversity useful for the engineering ofnucleic acids, proteins, pathways, cells and organisms to provide new orimproved characteristics.

[0183] It is often desirable to combine multiple diversity generatingmethodologies when generating diversity. For example, in conjunctionwith (or separately from) shuffling methods, a variety of mutationmethods can be practiced and the results (i.e., diverse populations ofnucleic acids) screened for in the systems of the invention. Additionaldiversity can be introduced by methods which result in the alteration ofindividual nucleotides or groups of contiguous or non-contiguousnucleotides, i.e., mutagenesis methods. Further details on certainexample mutation methodologies are provided below.

[0184] In one aspect, error-prone PCR is used, in which, e.g., PCR isperformed under conditions where the copying fidelity of the DNApolymerase is low, such that a high rate of point mutations is obtainedalong the entire length of the PCR product. Examples of such techniquesare found in the references above and, e.g., in Leung et al., (1989)Technique, 1: 11-15 (1989) and Caldwell et al. (1992) PCR MethodsApplic. 2: 28-33. Similarly, assembly PCR can be used, in a processwhich involves the assembly of a PCR product from a mixture of small DNAfragments. A large number of different PCR reactions can occur inparallel in the same vial, with the products of one reaction priming theproducts of another reaction. Sexual PCR mutagenesis can be used inwhich homologous recombination occurs between DNA molecules of differentbut related DNA sequence in vitro, by random fragmentation of the DNAmolecule based on sequence homology, followed by fixation of thecrossover by plimer extension in a PCR reaction. This process isdescribed in the references above, e.g., in Stemmer (1994) PNAS 91:10747-10751. Recursive ensemble mutagenesis can be used in which analgorithm for protein mutagenesis is used to produce diverse populationsof phenotypically related mutants whose members differ in amino acidsequence. This method uses a feedback mechanism to control successiverounds of combinatorial cassette mutagenesis. Examples of this approachare found in Arkin and Youvan PNAS USA 89: 7811-7815 (1992).

[0185] As noted, oligonucleotide directed mutagenesis can be used in aprocess which allows for the generation of site-specific mutations inany cloned DNA segment of interest. Examples of such techniques arefound in the references above and, e.g., in Reidhaar-Olson et al. (1988)Science, 241: 53-57. Similarly, cassette mutagenesis can be used in aprocess which replaces a small region of a double stranded DNA moleculewith a synthetic oligonucleotide cassette that differs from the nativesequence. The oligonucleotide can contain, e.g., completely and/orpartially randomized native sequence(s).

[0186] In vivo mutagenesis can be used in a process of generating randommutations in any cloned DNA of interest which involves the propagationof the DNA, e.g., in a strain of E. coli that carries mutations in oneor more of the DNA repair pathways. These “mutator” strains have ahigher random mutation rate than that of a wild-type parent. Propagatingthe DNA in one of these strains will eventually generate randommutations within the DNA.

[0187] Exponential ensemble mutagenesis can be used for generatingcombinatorial libraries with a high percentage of unique and functionalmutants, where small groups of residues are randomized in parallel toidentify, at each altered position, amino acids which lead to functionalproteins. Examples of such procedures are found in Delegrave and Youvan(1993) Biotechnology Research, 11: 1548-1552. Similarly, random andsite-directed mutagenesis can be used. Examples of such procedures arefound in Arnold (1993) Current Opinion in Biotechnology, 4: 450-455.

[0188] Many kits for mutagenesis are also commercially available. Forexample, kits are available from, e.g., Stratagene (e.g., theQuickChange site-directed mutagenesis kit; and the Chameleondouble-stranded, site-directed mutagenesis kit), Bio/Can Scientific,Bio-Rad (e.g., using the Kunkel method described above), BoehringerMannheim Corp., Clonetech Laboratories, DNA Technologies, EpicentreTechnologies (e.g., 5 prime 3 prime kit); Genpak Inc, Lemargo Inc, LifeTechnologies (Gibco BRL), New England Biolabs, Pharmacia Biotech,Promega Corp., Quantum Biotechnologies, Amersham International plc(e.g., using the Eckstein method above), and Anglian Biotechnology ltd(e.g., using the Carter/Winter method above).

[0189] Any of the described shuffling or mutagenesis techniques can beused in conjunction with procedures which introduce additional diversityinto a genome, e.g. a eukaryotic or bacterial genome. For example, inaddition to the methods above, techniques have been proposed whichproduce chimeric nucleic acid multimers suitable for transformation intoa variety of species, including E. coli and B. subtilis (see, e.g.,Schelienberger U.S. Pat. No. 5,756,316 and the references above). Whensuch chimeric multimers consist of genes that are divergent with respectto one another, (e.g., derived from natural diversity or through,application of site directed mutagenesis, error prone PCR, passagethrough mutagenic bacterial strains, and the like), are transformed intoa suitable host, this provides a source of nucleic acid diversity forDNA diversification.

[0190] In one aspect, a multiplicity of monomeric polynucleotidessharing regions of partial sequence similarity can be transformed into ahost species and recombined in vivo by the host cell. Subsequent roundsof cell division can be used to generate libraries, members of which,include a single, homogenous population, or pool of monomericpolynucleotides. Alternatively, the monomeric nucleic acid can berecovered by standard techniques, e.g., PCR and/or cloning, andrecombined in any of the recombination formats, including recursiverecombination formats, described above.

[0191] Methods for generating multispecies expression libraries havebeen described (in addition to the reference noted above, see, e.g.,Peterson et al. (1998) U.S. Pat. No. 5,783,431 “METHODS FOR GENERATINGAND SCREENING NOVEL METABOLIC PATHWAYS,” and Thompson, et al. (1998)U.S. Pat. No. 5,824,485 METHODS FOR GENERATING AND SCREENING NOVELMETABOLIC PATHWAYS) and their use to identify protein activities ofinterest has been proposed (In addition to the references noted above,see, Short (1999) U.S. Pat. No. 5,958,672 “PROTEIN ACTIVITY SCREENING OFCLONES HAVING DNA FROM UNCULTIVATED MICROORGANISMS”). Multispeciesexpression libraries include, in general, libraries comprising cDNA orgenomic sequences from a plurality of species or strains, operablylinked to appropriate regulatory sequences, in an expression cassette.The cDNA and/or genomic sequences are optionally randomly ligated tofurther enhance diversity. The vector can be a shuttle vector suitablefor transformation and expression in more than one species of hostorganism, e.g., bacterial species, eukaryotic cells. In some cases, thelibrary is biased by preselecting sequences which encode a protein ofinterest, or which hybridize to a nucleic acid of interest. Any suchlibraries can be provided as substrates for any of the methods hereindescribed.

[0192] Chimeric multimers transformed into host species are suitable assubstrates for in vivo shuffling protocols. Alternatively, amultiplicity of polynucleotides sharing regions of partial sequencesimilarity can be transformed into a host species and recombined in vivoby the host cell. Subsequent rounds of cell division can be used togenerate libraries, members of which, comprise a single, homogenouspopulation of monomeric or pooled nucleic acid. Alternatively, themonomeric nucleic acid can be recovered by standard techniques andrecursively recombined in any of the described shuffling formats.

[0193] Chain termination methods of diversity generation have also beenproposed (see, e.g., U.S. Pat. No. 5,965,408 and the references above).In this approach, double stranded DNAs corresponding to one or moregenes sharing regions of sequence similarity are combined and denatured,in the presence or absence of primers specific for the gene. The singlestranded polynucleotides are then annealed and incubated in the presenceof a polymerase and a chain terminating reagent (e.g., uv, gamma orX-ray irradiation; ethidium bromide or other intercalators; DNA bindingproteins, such as single strand binding proteins, transcriptionactivating factors, or histones; polycyclic aromatic hydrocarbons;trivalent chromium or a trivalent chromium salt; or abbreviatedpolymerization mediated by rapid thermocycling; and the like), resultingin the production of partial duplex molecules. The partial duplexmolecules, e.g., containing partially extended chains, are thendenatured and reannealed in subsequent rounds of replication or partialreplication resulting in polynucleotides which share varying degrees ofsequence similarity and which are chimeric with respect to the startingpopulation of DNA molecules. Optionally, the products or partial poolsof the products can be amplified at one or more stages in the process.Polynucleotides produced by a chain termination method, such asdescribed above are suitable substrates for DNA shuffling according toany of the described formats.

[0194] Diversity can also be generated using, for example, incrementaltruncation for the creation of hybrid enzymes (ITCHY) described inOstermeier et al. (1999) “A combinatorial approach to hybrid enzymesindependent of DNA homology” Nature Biotech 17: 1205, can be used togenerate an initial recombinant library which serves as a substrate forone or more rounds of in vitro or in vivo shuffling methods. Anyhomology or non-homology based mutation/recombination format can be usedto generate diversity, separately or in combination.

[0195] In some applications, it is desirable to preselect or prescreenlibraries (e.g., an amplified library, a genomic library, a cDNAlibrary, a normalized library, etc.) or other substrate nucleic acidsprior to shuffling, or to otherwise bias the substrates towards nucleicacids that encode functional products (shuffling procedures can also,independently have these effects). For example, in the case of antibodyengineering, it is possible to bias the shuffling process towardantibodies with functional antigen binding sites by taking advantage ofin vivo recombination events prior to DNA shuffling by any describedmethod. For example, recombined CDRs derived from B cell cDNA librariescan be amplified and assembled into framework regions (e.g., Jirholt etal. (1998) “Exploiting sequence space: shuffling in vivo formedcomplementarity determining regions into a master framework” Gene 215:471) prior to DNA shuffling according to any of the methods describedherein.

[0196] Libraries can be biased towards nucleic acids which encodeproteins with desirable enzyme activities. For example, afteridentifying a clone from a library which exhibits a specified activity,the clone can be mutagenized using any known method for introducing DNAalterations, including, but not restricted to, DNA shuffling. A librarycomprising the mutagenized homologues is then screened for a desiredactivity, which can be the same as or different from the initiallyspecified activity. An example of such a procedure is proposed in U.S.Pat. No. 5,939,250. Desired activities can be identified by any methodknown in the art. For example, WO 99/10539 proposes that gene librariescan be screened by combining extracts from the gene library withcomponents obtained from metabolically rich cells and identifyingcombinations which exhibit the desired activity. It has also beenproposed (e.g., WO 98/58085) that clones with desired activities can beidentified by inserting bioactive substrates into samples of thelibrary, and detecting bioactive fluorescence corresponding to theproduct of a desired activity using a fluorescent analyzer, e.g., a flowcytometry device, a CCD, a fluorometer, or a spectrophotometer.

[0197] Libraries can also be biased towards nucleic acids which havespecified characteristics, e.g., hybridization to a selected nucleicacid probe. For example, application WO 99/10539 proposes thatpolynucleotides encoding a desired activity (e.g., an enzymaticactivity, for example: a lipase, an esterase, a protease, a glycosidase,a glycosyl transferase, a phosphatase, a kinase, an oxygenase, aperoxidase, a hydrolase, a hydratase, a nitrilase, a transaminase, anamidase or an acylase) can be identified from among genomic DNAsequences in the following manner. Single stranded DNA molecules from apopulation of genomic DNA are hybridized to a ligand-conjugated probe.The genomic DNA can be derived from either a cultivated or uncultivatedmicroorganism, or from an environmental sample. Alternatively, thegenomic DNA can be derived from a multicellular organism, or a tissuederived therefrom. Second strand synthesis can be conducted directlyfrom a hybridization probe used in the capture, with or without priorrelease from the capture medium or by a wide variety of other strategiesknown in the art. Alternatively, the isolated single-stranded genomicDNA population can be fragmented without further cloning and useddirectly in a shuffling-based gene reassembly process. In one suchmethod the fragment population derived the genomic library(ies) isannealed with partial, or, often approximately full length ssDNA or RNAcorresponding to the opposite strand. Assembly of complex chimeric genesfrom this population is the mediated by nuclease-based removal ofnon-hybridizing fragment ends, polymerization to fill gaps between suchfragments and subsequent single stranded ligation. The parental strandcan be removed by digestion (if RNA or uracil-containing), magneticseparation under denaturing conditions (if labeled in a manner conduciveto such separation) and other available separation/purification methods.Alternatively, the parental strand is optionally co-purified with thechimeric strands and removed during subsequent screening and processingsteps. As set detailed, e.g., in “SINGLE-STRANDED NUCLEIC ACIDTEMPLATE-MEDIATED RECOMBINATION AND NUCLEIC ACID FRAGMENT ISOLATION” byAffholter, U.S. Ser. No. 60/186,482 filed Mar. 2, 2000, and U.S. Ser.No. 09/656,549, Filed Sep. 6, 2000 shuffling using single-strandedtemplates and nucleic acids of interest which bind to a portion of thetemplate can also be performed.

[0198] “Non-Stochastic” methods of generating nucleic acids andpolypeptides are proposed in Short “Non-Stochastic Generation of GeneticVaccines and Enzymes” WO 00/46344. These methods, including proposednon-stochastic polynucleotide reassembly and site-saturation mutagenesismethods can be applied to the present invention as well. Random orsemi-random mutagenesis using doped or degenerate oligonucleotides isalso described in, e.g., Arkin and Youvan (1992) “Optimizing nucleotidemixtures to encode specific subsets of amino acids for semi-randommutagenesis” Biotechnology 10: 297-300; Reidhaar-Olson et al. (1991)“Random mutagenesis of protein sequences using oligonucleotidecassettes” Methods Enzymol. 208: 564-86; Lim and Sauer (1991) “The roleof internal packing interactions in determining the structure andstability of a protein” J. Mol. Biol. 219: 359-76; Breyer and Sauer(1989) “Mutational analysis of the fine specificity of binding ofmonoclonal antibody 51F to lambda repressor” J. Biol. Chem. 264:13355-60); and “Walk-Through Mutagenesis” (Crea, R; U.S. Pat. Nos.5,830,650 and 5,798,208, and EP Patent 0527809 B1.

[0199] In one approach, described in more detail herein, single-strandedmolecules are converted to double-stranded DNA (dsDNA) and the dsDNAmolecules are bound to a solid support by ligand-mediated binding. Afterseparation of unbound DNA, the selected DNA molecules are released fromthe support and introduced into a suitable host cell to generate alibrary enriched sequences which hybridize to the probe. A libraryproduced in this manner provides a desirable substrate for any of theshuffling reactions described herein.

[0200] It will further be appreciated that any of the above describedtechniques suitable for enriching a library prior to shuffling can beused to screen the products generated by the methods of DNA shuffling.

[0201] The above references provide many mutational formats, includingrecombination, recursive recombination, mutation by non-recombinationdirected methods. recursive mutation in any format as well as manymodifications of these formats. Regardless of the diversity generationformat that is used, the nucleic acids of the invention can berecombined (with each other, or with related (or even unrelated)sequences) to produce a diverse set of recombinant nucleic acids,including, e.g., sets of homologous nucleic acids, as well ascorresponding polypeptides.

Non-PCR Based Recombination Methods

[0202] As noted above, site-directed or oligonucleotide-directedmutagenesis methods can be used to generate chimeras between 2 or moreparental genes. Many methods are described in the literature and someare listed herein that do not depend on PCR, though PCR-based methodsare also fully described herein and useful in the context of the presentinvention.

[0203] A common theme to many non-PCR based methods is preparation of asingle-stranded template to which primers (e.g., syntheticoligonucleotides, single-stranded DNA or RNA fragments) are annealed,then elongated by a DNA or RNA polymerase in the presence of dNTPs andappropriate buffer. The gapped duplex can be sealed with DNA ligaseprior to transformation or electroporation into E. coli. In someinstances, e.g., where a substantially coextensive heterolog isgenerated by annealing of multiple primers to a template, ligase aloneis sufficient to produce a recombinant DNA strand. In some instances,e.g., where there are “flaps” of nucleic acid which do not hybridize tothe template, an exo- or endo-nuclease can be used to eliminateunhybridized portions of a bound nucleic acid prior to polymerase and/orligase treatment.

[0204] The newly synthesized strand is replicated and generates achimeric gene with contributions from the oligo in the context of thesingle-stranded (ss) parent. The ss template can be prepared, e.g., byincorporation of the phage IG region into the plasmid and use of ahelper phage such as M13KO7 or R408 to package ss plasmids intofilamentous phage particles. The ss template can also be generated bydenaturation of a double-stranded template and annealing in the presenceof the primers. Methods vary, e.g., in the enrichment protocols forisolation of the newly synthesized chimeric strand over the parentaltemplate strand and are described in the references below. The “Kunkel”method uses uracil-containing templates. The Eckstein method usesphosphorothioate-modified DNA. The use of restriction selection orpurification can be used in conjunction with mismatch repair deficientstrains.

[0205] In the context of the present invention, the “mutagenic” primerdescribed in these methods can be one or more synthetic oligonucleotidesencoding any type of randomization, insertion, deletion, family geneshuffling oligonucleotides based on sequence diversity of homologousgenes, etc. Oligos that randomize particular sequences (eg. NNG/C),encode conservative replacements for particular residues (eg. NUN forhydrophobic residues), spiked oligos where the correct nucleotidesequence is synthesized in the background of a low level of all 3mismatched nucleotides, incorporation of deoxyinosine or other ambiguousnucleotide analogs, incorporation, insertions, deletions, error pronePCR, etc. can be used. The primer(s) can also be, e.g., fragments ofhomologous genes that are annealed to the ss parent template. In thisway chimeras between 2 or more parental genes can be generated.

[0206] Multiple primers can anneal to a given template and be extendedto create multiply chimeric genes. The use of a DNA polymerase such asthose from phages T4 or T7 are good for this purpose as they will notdegrade or displace a downstream primer from the template.

[0207] In one class of preferred embodiments, the ss template or one ormore primers (e.g., mutagenic primers) is immobilized on a solidsubstrate such as a chip or a membrane. In other embodiments, annealingand extension occurs in a liquid phase array, such as in a reactionsolution within wells of a microtiter plate or an arrangement of testtubes.

[0208] Example: Dna Shuffling Using Uracil Containing Templates

[0209] For example, in one aspect, a gene of interest is cloned into anE. coli plasmid containing the filamentous phage intergenic (IG, ori)region. Single stranded (ss) plasmid DNA is packaged into phageparticles upon infection with a helper phage such as M13KO7 (Pharmacia)or R408 and is purified by methods such as phenol/chloroform extractionand ethanol precipitation. If this DNA is prepared in a dut³¹ ung³¹strain of E. coli, a small number of uracil residues are incorporatedinto it in place of normal thymine residues. The ratio of the amount ofuracil residues to the amount of thymidine residues used typicallydepends on the desired nucleic acid fragment size. The ratio isoptionally calculated using appropriate software or instruction sets asdescribed below. The instructions are typically programmed into adiversity generation device of the invention, e.g., in a computerreadable format in a computer operably coupled to a diversity generationdevice or directly into a thermocycler used in a diversity generationdevice.

[0210] One or more primers as defined above are annealed to the ssuracil-containing template by heating to 90° C. and slowly cooling toroom temperature. An appropriate buffer containing all 4deoxyribonucleotides, T7 DNA polymerase and T4 DNA ligase is added tothe annealed template/primer mix and incubated between roomtemperature−37° C. for ≧1 hour. The T7 DNA polymerase extends from the3′ end of the primer and synthesizes a complementary strand to thetemplate incorporating the primer. DNA ligase seals the gap between the3′ end of the newly synthesized strand and the 5′ end of the primer. Ifmultiple primers are used, then the polymerase will extend to the nextprimer, stop (preferentially, polymerases that are arrested bydownstream bound nucleic acids are used for this purpose) and ligasewill seal the gap. As noted above, an exonuclease can be employed, e.g.,prior to polymerase treatment.

[0211] The products of these reactions are then transformed into an ung⁺strain of E. coli and antibiotic selection for the plasmid is applied.Uracil N-glycosylase (the ung gene product) enzyme in the host cellrecognizes the uracil in the template strand and removes it, creatingapyrimidinic sites that are either not replicated or which are correctedby the host repair systems using the newly synthesized strand as atemplate. The resulting plasmids predominantly contain the desiredchange in the gene if interest. If multiple primers are used then it ispossible simultaneously to introduce numerous changes in a singlereaction. If the primers are derived from fragments of homologous genes,then multiply chimeric genes can be generated.

[0212] Any of these diversity generating methods (shuffling,mutagenesis, etc.) can be combined with each other, in any combinationselected by the user, to produce nucleic acid diversity, which may bescreened for using any available screening method. The section belowentitled “Diversity Generation Modules” provides further detailsregarding generation of diversity in the devices, modules and systems ofthe present invention.

[0213] A. Diversity Generation Modules

[0214] The automated production of diverse libraries can be used toincrease the throughput of forced evolution methods. A variety ofdiversity production strategies can be used. Shuffling and otherdiversity generating modules of the invention provide a convenient wayto generate diversity from starting nucleic acids. Diversity generationmodules automate one or more relevant diversity generating process.

[0215] For example, the diversity generation module can take the form ofa nucleic acid shuffling or mutagenesis module which can accept inputnucleic acids or character strings corresponding to input nucleic acidsand can manipulate the input nucleic acids or the character stringscorresponding to input nucleic acids to produce output nucleic acids. Inaddition, the diversity generation modules of the invention areoptionally used to select appropriate input nucleic acids or characterstrings corresponding to input nucleic acids which are typicallyshuffled to produce output nucleic acids. In any case, the outputnucleic acids can comprise the one or more shuffled or mutagenizednucleic acids in the reaction mixture arrays of the invention, orfragments thereof. In addition to performing diversity-generationreactions, the diversity generation module optionally separates,identifies, purifies, immobilizes or otherwise treats diversifiednucleic acids for further analysis.

[0216] Common formats for the diversity generation module can includecomputer systems for designing and selecting nucleic acids,oligonucleotide synthesizers, liquid handlers for moving and mixingreagents (e.g., microwell plates, automatic pipettors, peristalticpumps, etc.). The nucleic acid shuffling module can include one or moremicroscale channel through which a shuffling reagent or product isflowed which can be integrated in a chip, or present in a series ofmicrocapillaries.

[0217] For example, in addition to, in conjunction with, or in place ofa standard automatic pipetting station and set of microwell plates,devices or integrated systems can include physical or logical arrays ofreaction mixtures incorporated into the automatic pipetting station andset of microwell plates, or into a microscale device. Alternately, atleast one of the reaction mixtures can be incorporated into a microscaledevice or a delivery system which interfaces with the automaticpipetting station and set of microwell plates. In one embodiment, theone or more shuffled or mutagenized nucleic acids (or a transcribed formthereof) can be found within a microscale device or the microwellplates, or the one or more in vitro transcription or translationreagents can be found within the plates or the microscale device. Anyreagent associated with any operation of the module can be found withinstandard robotic systems, or in a microscale device, or in microwellplates, or on solid substrates or other storage systems as noted hereinand any operation or set of operations for the module can be performedin a microscale or milliscale format. Thus, all or part of the modulecan be embodied in one or more automatic pipetting station, roboticfluid handling systems, in microcapillary systems (e.g., includingintegrated microchannel devices). or combinations thereof.

[0218] (1.) Selection and Acquisition of Targets for DiversityGeneration Processes

[0219] The identification and acquisition of nucleic acid targets fordiversity generation can be performed by the diversity generatingmodules of the invention. For example, selection algorithms can be usedto identify sequences in public or proprietary databases which meet anyuser-selected criterion as a target for diversity generation. These usercriteria include activity, encoded activity, homology, publicavailability, and any other criteria of interest. In addition, characterstrings corresponding to nucleic acids (or their derived polypeptides)can be generated according to any set of criteria selected by the user,including similarity to existing sequences, modification of an existingsequence according to any desired modification parameter (geneticalgorithm, etc.), random or non-random (e.g., weighted) sequencegeneration, etc. Data structures comprising diverse sequences can beformed in a digital or analog computer or in a computer readable mediumand the data structures converted from character strings to nucleicacids (e.g., via automated synthesis protocols) for subsequent physicalmanipulations. Alternatively, the character strings are manipulated orshuffled “in silico” to produce diverse nucleic acids, based upon anygenetic algorithm or operator selected by the practitioner.

[0220] Either computer data or nucleic acids can be “data structures,” aterm which refers to the organization and optionally associated devicefor the storage of information, typically comprising multiple “pieces”of information. The data structure can be a simple recordation of theinformation (e.g., a list) or the data structure can contain additionalinformation (e.g., annotations) regarding the information containedtherein, can establish relationships between the various “members”(information “pieces”) of the data structure, and can provide pointersor be linked to resources external to the data structure. The datastructure can be intangible but is rendered tangible whenstored/represented in tangible medium (e.g., in a computer medium, anucleic acid or set of nucleic acids, or the like). The data structurecan represent various information architectures including, but notlimited to simple lists, linked lists, indexed lists, data tables,indexes, hash indices, flat file databases, relational databases, localdatabases, distributed databases, thin client databases, and/or thelike.

[0221] Nucleic acids can be selected by the user based upon sequencesimilarity to one or more additional nucleic acid. Different types ofsimilarity and considerations of various stringency and character stringlength can be detected and recognized in the target acquisition phase ofthe invention. For example, many homology determination methods havebeen designed for comparative analysis of sequences of biopolymers, forspell-checking in word processing, and for data retrieval from variousdatabases. With an understanding of double-helix pair-wise complementinteractions among the principal nucleobases in natural polynucleotides,models that simulate annealing of complementary homologouspolynucleotide strings can also be used as a foundation of sequencealignment or other operations typically performed on the characterstrings corresponding to the sequences of interest (e.g.,word-processing manipulations, construction of figures comprisingsequence or subsequence character strings, output tables, etc.). Anexample of a dedicated software package with genetic algorithms forcalculating sequence similarity and other operations of interest isBLAST, which can be used in the present invention to select targetsequence (e.g., based upon homology) for acquisition and supply to thediversity generating modules of the invention.

[0222] BLAST is described in Altschul et al., J. Mol. Biol. 215: 403-410(1990). Software for performing BLAST analyses is publicly availablethrough the National Center for Biotechnology Information(http://www.ncbi.nlm.nih.gov/). This algorithm first identifies highscoring sequence pairs (HSPs) by identifying short words of length W inthe query sequence, which either match or satisfy some positive-valuedthreshold score T when aligned with a word of the same length in adatabase sequence. T is referred to as the neighborhood word scorethreshold (Altschul et al., supra). These initial neighborhood word hitsact as seeds for initiating searches to find longer HSPs containingthem. The word hits are then extended in both directions along eachsequence for as far as the cumulative alignment score can be increased.Cumulative scores are calculated using, for nucleotide sequences, theparameters M (reward score for a pair of matching residues; always >0)and N (penalty score for mismatching residues; always <0). For aminoacid sequences, a scoring matrix is used to calculate the cumulativescore. Extension of the word hits in each direction are halted when: thecumulative alignment score falls off by the quantity X from its maximumachieved value; the cumulative score goes to zero or below, due to theaccumulation of one or more negative-scoring residue alignments; or theend of either sequence is reached. The BLAST algorithm parameters W, T,and X determine the sensitivity and speed of the alignment. The BLASTNprogram (for nucleotide sequences) uses as defaults a wordlength (W) of11, an expectation (E) of 10, a cutoff of 100, M=5, N=−4, and acomparison of both strands. For amino acid (protein) sequences, theBLASTP program uses as defaults a wordlength (W) of 3, an expectation(E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff(1989) Proc. Natl. Acad. Sci. USA 89: 10915).

[0223] An additional example of a useful sequence alignment algorithm isPILEUP. PILEUP creates a multiple sequence alignment from a group ofrelated sequences using progressive, pairwise alignments. It can alsoplot a tree showing the clustering relationships used to create thealignment. PILEUP uses a simplification of the progressive alignmentmethod of Feng & Doolittle, J. Mol. Evol. 35: 351-360 (1987). The methodused is similar to the method described by Higgins & Sharp, CABIOS5:151-153 (1989). The program can align, e.g., up to 300 sequences of amaximum length of 5,000 letters. The multiple alignment procedure beginswith the pairwise alignment of the two most similar sequences, producinga cluster of two aligned sequences. This cluster can then be aligned tothe next most related sequence or cluster of aligned sequences. Twoclusters of sequences can be aligned by a simple extension of thepairwise alignment of two individual sequences. The final alignment isachieved by a series of progressive, pairwise alignments. The programcan also be used to plot a dendogram or tree representation ofclustering relationships. The program is run by designating specificsequences and their amino acid or nucleotide coordinates for regions ofsequence comparison.

[0224] As noted, the diversity generation module can comprise a DNAshuffling module. In one preferred embodiment, this module accepts inputnucleic acids such as DNAs or character strings corresponding to inputDNAs and manipulates the input DNAs or the character stringscorresponding to input DNAs to produce output DNAs, which output DNAscomprise the one or more shuffled DNAs in the reaction mixture array.This can be performed by physical manipulation of nucleic acids as notedabove, or character strings in computer systems, or both. For example,in addition to simply selecting nucleic acids of interest, computersystems can be used to produce character strings which correspond tonucleic acid targets for diversity generation. A variety of geneticalgorithms for modifying character strings which correspond tobiopolymers are set forth in detail in, e.g., “METHODS FOR MAKINGCHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al., filed Feb. 5, 1999 (U.S. Ser. No.60/118854), U.S. Ser. No. 09/416,375 filed Oct. 12, 1999, ApplicationNo. PCT/US00/01202, filed Jan. 18, 2000, and, e.g., U.S. Ser. No.09/618,579 filed Jul. 18, 2000. These genetic algorithms (GAs) include,e.g., modifying nucleic acid sequences to correspond to physicalmutation events such as point mutation, nucleotide insertion, deletion,recombination and the like. Sequences can also be tested for fitness orany other parameter, including multidimensional parameters, byparameterizing any selection criteria and then selecting sequences whichfall within the hyperspace defined by the set of parameters.Combinations of automated design (e.g., protein design automation, or“PDA”), e.g., to select cross-over points for recombination based upon,e.g., physical (e.g., presence of encoded protein or other domains) orstatistical (e.g., principal component analysis (“PCA”), Markovmodeling, neural networks, etc.) criteria and random approaches (e.g.,physical recombination of synthesized nucleic acids) can also be used.Further details on such approaches are found in the applications notedabove.

[0225] For example, the present methods for selecting nucleic acids forshuffling are used to insure that the parental sequences chosen fordiversity generation supply sufficient diversity yet can be recombinedor shuffled in practice. Typically, sequences are chosen forrecombination/shuffling based on percent homology, or based onphylogenetic relationships. Typically, a level of at least 50% sequencehomology is required for efficient recombination between a pair ofsequences. However, this general limit can be overcome by theintroduction of additional (wild type, naturally occurring or synthetic)sequences which the ‘bridge’ the diversity within any given sequencepair. This module may act to enhance recombinational efficiency within asequence population by further prescribing the synthesis or addition ofa limited set of additional sequences not resident within the initialparental sequences. The likelihood that any two or more parents arecompatible for recombination/shuffling is a consequence of the chance ofrecombination occurring during the process. Frequency of recombinationis a direct consequence of the melting point of the hybrid molecule.Phylogenetic relationship and/or percent homology provide indirectmeasurements of the same thing. Therefore, the following method isoptionally used to provide an improved selection of sequences fordiversity generation. The method is an automated process by whichparental sequences are found, scored and chosen for shuffling based onmelting temperature. In addition, parental divergence is calculated andscored to enable an experimenter to make an informed decision uponchoosing parental nucleic acids for shuffling.

[0226] In one embodiment, a set of nucleic acid sequences or characterstrings corresponding to nucleic acid sequences is selected using acomputer or set of instructions embodied in a computer readable medium,e.g., on a web page. Such a method typically comprises performing analignment, e.g., a pairwise alignment, between two or more potentialparental nucleic acid sequences, e.g., using clustalw or one or more ofthe programs described herein. Potential parental nucleic acid sequencesare also optionally selected using a computer, e.g., by searching one ormore database for one or more nucleic acid sequence of interest and oneor more homolog of the one or more nucleic acid sequence of interest.

[0227] The number of mismatches between the alignment is thencalculated. Melting temperatures for one or more window of w bases inthe alignment are also calculated, identifying those windows having amelting temperature greater than x. Melting temperatures are optionallycalculated from one or more set of empirical data or one or more meltingtemperature prediction algorithm. A window of w bases typicallycomprises, e.g., about 21 bases. Preferably, w is an odd number and themelting temperature cutoff, x, is typically about 65° C.

[0228] One or more crossover segment in the alignment is thenidentified. A crossover segment is one comprising two or more windowshaving a melting temperature greater than x, which two or more windowsare separated by no more than n nucleotides, with n typically about 2.FIG. 33 illustrates the melting temperature for a pairwisehybridization. In this example, the line indicates the meltingtemperature cutoff point and the arrows indicate various crossoversegments.

[0229] The dispersion, e.g., the inverse of the average number of basesbetween crossover segments in the alignment, for the crossover segmentsidentified is then typically calculated. The above calculations are thencombined to provide two scores, e.g., a shuffleability score and adiversity capture score, for each alignment pair.

[0230] The shuffleability score is based on the number of windows havinga melting temperature greater that x, the dispersion, and the number ofcrossover segments identified. For example, the number of windows, thedispersion, and the number of segments are multiplied together. Thisscore reflects how well the aligned sequences would cross over during ashuffling reaction, e.g., in silico shuffling or shuffling in anotherdiversity generation device of the invention, and how much of thesequences are likely to be shuffled.

[0231] The diversity capture score is based on the number of mismatchesin the alignment, the number of windows having a melting temperaturegreater that x, the dispersion, and the number of crossover segmentsidentified. The score is representative not only of how well thesequences would recombine, but also of how well recombining thesesequences together would create diversity.

[0232] The sequences are then ranked according to one or both of theabove scores and sequences for shuffling are selected based on theranks. To further evaluate the sequences for shuffleability, the abovesteps are optionally repeated, e.g., starting with the one or moreparental nucleic acid selected in the first cycle. Alternatively, thesteps are repeated starting with the same or different potentialparental nucleic acid sequences using one or more different inputparameters, e.g., for calculating the melting temperature.

[0233] The above methods are optionally used, e.g., with varyingpotential parental sequences and melting temperature parameters, e.g.,to optimize the diversity capture score while minimizing the amount ofparental sequences needed for shuffling. In addition, the algorithm isoptionally used with certain restrictions, e.g., that a particularlydesirable parent or parents must be included in the final set ofparents. For example, the method could be set up to walk between twoparental sequences of interest. “Walking” refers to the process by whichrecombinations are obtained between two low homology parental sequencesvia intermediate sequences, i.e., A recombines with B, which recombineswith C, which recombines with D, wherein A and D do not directlyrecombine.

[0234] Other parameters are also optionally optimized in the selectionof parents or to modify the scoring. Such parameters include, but arenot limited to, the activity of the various parents, freedom to operateclearance, e.g., by an automatic search through a patent or literaturedatabase, the feasibility of obtaining the parents, the expressionlevels of the parents, and the compatibility of the parents codingsequences with the codon bias of one or more organisms.

[0235] For example, the above method is optionally used as describedbelow, e.g., in an automated computerized format. A researcher submits asmall molecule substrate or product, e.g., to a computer program, e.g.,embodied in a diversity generation device or on a web page. A chemicalstructure comparison search is performed on the small molecule, e.g.,using ISIS or another such database. Such comparison is optionallyperformed manually or using a computer. The small molecule and relatedstructures or homologs are used to search one or more databases, e.g.,KEGG, WIT, or the like, for genes that are related to or have anactivity on one or more of the compounds of interest. The genes are usedto find homologs for shuffling, e.g., by searching databases, such asBLAST, HMMR, fasta, Smith Waterman, and the like. The gene sequencesfound are reverse translated, e.g., to optimize shuffleability, optimizecodon usage for a given host, and/or maximize the difference from aparent that is prohibited by a lack freedom to operate. In someembodiments, it is desirable to have as few genes as possible forshuffling. Therefore, the genes are optionally weighted based onactivity, species, environment, or diversity. A final set of parentalsequences is determined based on the scores obtained as described aboveand the various weights given to each sequence. Oligonucleotides orcharacter strings that correspond to oligonucleotides for gene synthesisbased on the selected parental nucleic acids are then created, e.g., forsynthetic shuffling or in silico shuffling.

[0236] Nucleic acids which hybridize to one another are often providedto the system as starting nucleic acids for recombination-baseddiversity generation procedures. Further, nucleic acid hybridization canbe estimated and used as a basis for selection in a computer system, ina manner similar to selecting for sequence similarity as set forth above(similar sequences typically hybridize). Nucleic acids “hybridize” whenthey associate, typically in solution. Nucleic acids hybridize due to avariety of well characterized physico-chemical forces, such as hydrogenbonding, solvent exclusion, base stacking and the like and, thus, theseinteractions can be modeled. An extensive guide to the hybridization ofnucleic acids is found in Tijssen (1993) Laboratory Techniques inBiochemistry and Molecular Biology—Hybridization with Nucleic AcidProbes part I chapter 2, “Overview of principles of hybridization andthe strategy of nucleic acid probe assays,” (Elsevier, N.Y.), as well asin Ausubel, supra. Hames and Higgins (1995) Gene Probes 1 IRL Press atOxford University Press, Oxford, England, (Hames and Higgins 1) andHames and Higgins (1995) Gene Probes 2 IRL Press at Oxford UniversityPress, Oxford, England (Hames and Higgins 2) provide details on thesynthesis, labeling, detection and quantification of DNA and RNA,including oligonucleotides.

[0237] “Stringent hybridization wash conditions” in the context ofnucleic acid hybridization experiments such as Southern and northernhybridizations are sequence dependent, and are different under differentenvironmental parameters. An extensive guide to the hybridization ofnucleic acids is found in Tijssen (1993), supra. and in Hames andHiggins, 1 and 2. For purposes of the present invention, generally,“highly stringent” hybridization and wash conditions are selected to beabout 5° C. lower than the thermal melting point (T_(m)) for thespecific sequence at a defined ionic strength and pH. The T_(m) is thetemperature (under defined ionic strength and pH) at which 50% of thetest sequence hybridizes to a perfectly matched probe. Very stringentconditions are selected to be equal to the T_(m) for a particular probe.

[0238] An example of stringent hybridization conditions forhybridization of complementary nucleic acids which have more than 100complementary residues on a filter in a Southern or northern blot is 50%formalin with 1 mg of heparin at 42° C., with the hybridization beingcarried out overnight. An example of stringent wash conditions is a0.2×SSC wash at 65° C. for 15 minutes (see, Sambrook, supra for adescription of SSC buffer). Often the high stringency wash is precededby a low stringency wash to remove background probe signal. An examplelow stringency wash is 2×SSC at 40° C. for 15 minutes. In general, asignal to noise ratio of 5× (or higher) than that observed for anunrelated probe in the particular hybridization assay indicatesdetection of a specific hybridization. Comparative hybridization can beused to identify nucleic acids as inputs to the systems of theinvention.

[0239] Providing nucleic acids which are identified or generated asnoted above optionally takes one of two basic forms.

[0240] First, where a nucleic acid is selected which corresponds to aphysically existant nucleic acid, that nucleic acid can be acquired bycloning, PCR amplification or other nucleic acid isolation methods as iscommon in the art. An introduction to such methods is found in availablestandard texts, including Berger and Kimmel, Guide to Molecular CloningTechniques, Methods in Enzymology volume 152 Academic Press, Inc., SanDiego, Calif. (Berger); Sambrook et al., Molecular Cloning—A LaboratoryManual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold SpringHarbor, N.Y., 1989 (“Sambrook”) and Current Protocols in MolecularBiology, F. M. Ausubel et al., eds., Current Protocols, a joint venturebetween Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.,(supplemented through 1999) (“Ausubel”)). Examples of techniquessufficient to direct persons of skill through ini vitro amplificationmethods, useful in identifying, isolating and cloning nucleic aciddiversity targets, including the polymerase chain reaction (PCR) theligase chain reaction (LCR), Qβ-replicase amplification and other RNApolymerase mediated techniques (e.g., NASBA), are found in Berger,Sambrook, and Ausubel, as well as Mullis et al., (1987) U.S. Pat. No.4,683,202; PCR Protocols A Guide to Methods and Applications (Innis etal. eds) Academic Press Inc. San Diego, Calif. (1990) (Innis); Amheim &Levinson (Oct. 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991)3, 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173;Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell etal. (1989) J. Clin. Chem 35, 1826; Landegren et al., (1988) Science 241,1077-1080; Van Brunt (1990) Biotechnology 8, 291-294; Wu and Wallace,(1989) Gene 4, 560; Barringer et al. (1990) Gene 89, 117, and Sooknananand Malek (1995) Biotechnology 13: 563-564. Improved methods of cloningin vitro amplified nucleic acids are described in Wallace et al., U.S.Pat. No. 5,426,039. Improved methods of amplifying large nucleic acidsby PCR are summarized in Cheng et al. (1994) Nature 369: 684-685 and thereferences therein, in which PCR amplicons of up to 40 kb are generated.One of skill will appreciate that essentially any RNA can be convertedinto a double stranded DNA suitable for restriction digestion, PCRexpansion and sequencing using reverse transcriptase and a polymerase.See, Ausubel, Sambrook and Berger, all supra.

[0241] Host cells can be transduced with nucleic acids of interest,e.g., cloned into vectors, for production of nucleic acids andexpression of encoded molecules (these encoded molecules can be used,e.g., as controls to determine a baseline activity to compare encodedactivities of a diverse library of nucleic acids to). In addition toBerger, Sambrook and Ausubel, a variety of references, including, e.g.,Freshney (1994) Culture of Animal Cells, a Manual of Basic Technique,third edition, Wiley- Liss, New York and the references cited therein,Payne et al. (1992) Plant Cell and Tissue Culture in Liquid Systems JohnWiley & Sons, Inc. New York, N.Y.; Gamborg and Phillips (eds) (1995)Plant Cell, Tissue and Organ Culture; Fundamental Methods Springer LabManual, Springer-Verlag (Berlin Heidelberg New York) and Atlas and Parks(eds) The Handbook of Microbiological Media (1993) CRC Press, BocaRaton, Fla. provide additional details on cell culture, cloning andexpression of nucleic acids in cells.

[0242] Sources for physically existant nucleic acids include nucleicacid libraries, cell and tissue repositories, the NIH, USDA and othergovernmental agencies, the ATCC, zoos, nature and many others familiarto one of skill. For example, a wide variety of samples can be obtainedfrom nature which are suitable for use in the present invention. Theseinclude, but are not limited to, environmental isolates from remote,unusual, contaminated or common soils, clays, aquifers and marinelocalities; high and low moisture environments; living, dead, decayed orpartially decayed tissues of plants or animals; environmental isolatescontaining a plurality of microorganisms; extracts from the gut flora ofvertebrates and invertebrates, including symbiotic and endosymbioticmicroorganisms. While these diverse sources provide many nucleic acids,there are many others which exist only as a result of computeralgorithms as described above, or, even though existant, are difficultto acquire from nature (but often straightforward to synthesize, givenan appropriate sequence).

[0243] The second basic method for acquiring nucleic acids does not relyon the physical pre-existence of a nucleic acid. Instead, nucleic acidsare generated synthetically, e.g., using well-established nucleic acidsynthesis methods. For example, nucleic acids can be synthesized usingcommercially available nucleic acid synthesis machines which utilizestandard solid-phase methods. Typically, fragments of up to about 100bases are individually synthesized, then joined (e.g., by enzymatic orchemical ligation methods, or polymerase mediated recombination methods)to form essentially any desired continuous sequence or sequencepopulation. For example, the polynucleotides and oligonucleotides of theinvention can be prepared by chemical synthesis using, e.g., theclassical phosphoramidite method described by Beaucage et al., (1981)Tetrahedron Letters 22: 1859-69, or the method described by Matthes etal., (1984) EMBO J. 3: 801-05., e.g., as is typically practiced inautomated synthetic methods. According to the phosphoramidite method,oligonucleotides are synthesized, e.g., in an automatic DNA synthesizer,assembled and, optionally, cloned in appropriate vectors. In addition,essentially any nucleic acid can be custom ordered from any of a varietyof commercial sources, such as The Midland Certified Reagent Company(mcrc@oligos.com), The Great American Gene Company(http://www.genco.com), ExpressGen Inc. (www.expressgen.com), OperonTechnologies Inc. (Alameda, Calif.) and many others. Similarly, peptidesand antibodies (useful in various embodiments noted below) can be customordered from any of a variety of sources, such as PeptidoGenic(pkim@ccnet.com), HTI Bio-products, inc. (http://www.htibio.com), BMABiomedicals Ltd (U.K.), Bio.Synthesis, Inc., Research Genetics(Huntsville, Ala.) and many others.

[0244] Synthetic approaches to nucleic acid generation have theadvantage of easy automation. Oligonucleotide synthesis machines caneasily be interfaced with a digital system that instructs which nucleicacids to be synthesized (indeed, such digital interfaces are generallypart of standard oligonucleotide synthesis devices). Similarly, orderingnucleic acids from commercial sources can be automated through simplecomputer programming and use of the internet (e.g., by having the userselect nucleic acids which are desired and providing an automatedordering system), with provisions for user inputs (nucleic acidselection) and outputs (synthesis of nucleic acids which are ordered).

[0245] Synthetic approaches can also be used to automate simultaneoussequence acquisition and diversity generation, i.e., through“oligonucleotide shuffling” and related technologies (see also,“OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION” by Crameri et al.,filed Feb. 5, 1999 (U.S. Ser. No. 60/118,813) and filed Jun. 24, 1999(U.S. Ser. No. 60/141,049) and filed Sep. 28, 1999 (U.S. Ser. No.09/408,392, Attorney Docket Number 02-29620US) and USSN PCT/USOO/01203filed Jan. 18, 2000; and “USE OF CODON-BASED OLIGONUCLEOTIDE SYNTHESISFOR SYNTHETIC SHUFFLING” by Welch et al., filed Sep. 28, 1999 (U.S. Ser.No. 09/408,393, Attorney Docket Number 02-010070US); and “METHODS FORMAKING CHARACTER STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIREDCHARACTERISTICS” by Selifonov et al. filed Feb. 5, 1999 (U.S. Ser. No.60/118854), U.S. Ser. No. 09/416,375 filed Oct. 12, 1999, ApplicationNo. PCT/US00/01202, filed Jan. 18, 2000, and, e.g., U.S. Ser. No.09/618,579 filed Jul. 18, 2000). In these methods, nucleic acidoligonucleotides corresponding to multiple parental nucleic acids aresynthesized, mixed and assembled via polymerase (e.g., PCR) or ligase(or both) mediated methods to produce recombinant nucleic acids whichhave subsequences corresponding to multiple parental nucleic acid types.

[0246] (2.) Sources and Destinations for Nucleic Acids in the Module

[0247] The assays of the invention are optionally partially orcompletely performed in a flowing format. That is, nucleic acids orother relevant reaction reagents are optionally flowed from sources(wells, channels, oligonucleotide synthesis elements, etc.) todestinations (reaction wells, channels, arrays, etc.), with reactionsoptionally being controlled by flowing reactants into contact in thesystem.

[0248] Thus, the nucleic acids which are selected and/or acquiredoptionally include one or more sources of one or more nucleic acidswhich collectively or individually comprise a first population ofnucleic acids. The diversified nucleic acids are produced by recombiningor otherwise mutating one or more members of the first population ofnucleic acids. This source of nucleic acids can be an in vitro, in vivoor virtual (in a digital system, i.e., “in silico”) source.

[0249] Sources of nucleic acids can include at least one nucleic acid,including, e.g., any of: a synthetic nucleic acid, a DNA, an RNA, a DNAanalogue, an RNA analogue, a genomic DNA, a cDNA, an mRNA, an nRNA, anaptamer, a cloned nucleic acid, a cloned DNA, a cloned RNA, a plasmidDNA, a viral DNA, a viral RNA, a YAC DNA, a cosmid DNA, a BAC DNA, aP1-mid, a phage DNA, a single-stranded DNA, a double-stranded DNA, abranched DNA, a catalytic nucleic acid, an antisense nucleic acid, an invitro amplified nucleic acid, a PCR amplified nucleic acid, an LCRamplified nucleic acid, a Qβ-replicase amplified nucleic acid, anoligonucleotide, a nucleic acid fragment, a restriction fragment or anycombination thereof, or other nucleic acid forms which are available.Alternately, the sources can be virtual or virtual and synthetic, andcan include one or more character string corresponding to such sources.In addition to virtual sources, data structures (which can be physicalor virtual) can be sources of nucleic acids (e.g., by combiningcharacter strings with synthetic methods), including diversified nucleicacids.

[0250] In addition to a source of nucleic acid, the module can include apopulation destination region. During operation of the device, one ormore members of the first population are optionally moved from one ormore sources of the one or more nucleic acids to the one or moredestination regions.

[0251] In general, the devices and systems can include nucleic acidmovement means for moving the one or more members from the one or moresources of the one or more nucleic acids to the one or more destinationregions (a variety of fluidic and non-fluidic means of moving componentsare described herein).

[0252] Sources, destinations and source and destination regions can bephysically embodied in many different ways. For example, they can bemicrotiter wells or dishes, fritted microtiter trays (e.g., for couplingto column chromatographic methods) microfluidic systems, microchannels,containers, data structures, computer systems, combinations thereof, orthe like. Examples of sources/destinations include solid phase arrays,liquid phase arrays, containers, microtiter trays, microtiter traywells, microfluidic components, microfluidic chips, test tubes,centrifugal rotors, microscope slides, an organism, a cell, a tissue,and combinations thereof.

[0253] As is noted in more detail herein, the systems of the inventionalso can similarly include sources of in vitro transcription ortranslation reagents, where, during operation of the device, the invitro transcription reagent or an in vitro translation reagent is flowedfrom a source into contact with nucleic acids to betranscribed/translated. Sources and destinations for other reactants asnoted herein are also optionally provided.

[0254] Any of the operations to be performed on individual array memberscan be performed sequentially or in parallel. As noted throughout,certain physical array formats such as microtiter tray-based approachesare well suited to parallel operations (i.e., having the same or similaroperations performed by approximately simultaneous additions of relevantreagents to the array, or approximately simultaneous removal ofmaterials from the array (e.g., for re-plating (e.g., for arrayduplication), purification of materials, and/or other downstreamoperations. As discussed herein, conventional high-throughput roboticsprovide one convenient way of performing these operations, which may, ofcourse, also be provided by manual manipulations, microfluidicapproaches, or other available methods. In some array formats,sequential operations are more conveniently performed, e.g., where thearray is a logical array with members which are not located in formatsthat provide for parallel manipulations.

[0255] In either case, robotic or other manipulations can be performeduniformly to the array, or can be selectively performed to individualarray members. These manipulations, and the actual motions used toachieve selective or parallel manipulations can be controlled byappropriate controller devices, e.g., computers linked to roboticelements with software comprising instruction sets for regulating therobotic or other material manipulative elements. The software isoptionally user programmable, i.e., to provide for parallel or selectiveoperations, e.g., to select “hits” for further manipulations.

[0256] Generally, as noted herein, master arrays or data sets (or both)can be maintained that preserve information regarding the spatiallocation of array elements in the system. Generally, duplicate arraysare acted upon by system elements (e.g., reagents are added to ormaterial removed from one or more duplicate array members), rather thanthe preserved master array members or data set elements.

[0257] In addition to flowable formats, nucleic acids, transcriptionreagents, translation reagents or other relevant reactants areoptionally fixed at one or more sources or at one or more destinationregions. In these “fixed” or “partially flowing” formats, reagents canbe localized to one or more locations and cognate reagents either fixedin proximity, or flowed (e.g., via pipetting) or otherwise delivered(e.g., via aerosolization, lyophilization, etc.) into contact withreagents of interest.

[0258] Movement means for moving nucleic acids and other reagentsinclude fluid pressure modulators (e.g., pipettors or otherpressure-driven channel systems), electrokinetic fluid force modulators,electroosmotic flow modulators, electrophoretic flow modulators,centrifugal force modulators, robotic armatures, pipettors, conveyormechanisms, stepper motors, robotic plate manipulators, peristalticpumps, magnetic field generators, electric field generators, fluid flowpaths and the like.

[0259] For example, the diversity generating module can include one ormore recombination modules which move one or more members of apopulation of nucleic acids into contact with one another, therebyfacilitating recombination of the first population of nucleic acids.Similarly, the diversity generation module can include one or morereaction mixture arraying modules, which move one or more of the one ormore diverse (e.g., shuffled) nucleic acids into one or more spatialpositions. The system can also provide for moving in vitrotranscription/translation reactant components into desired locations inthe array of reaction mixtures.

[0260] (3.) Dilution/Concentration Module

[0261] Shuffling/recombination/diversification module(s), or othermodules herein, optionally include a dilution or concentration function.In particular, it is often desirable to normalize the level of reactantor product at an array position (e.g., in a duplicate.diluted orconcentrated array) so that product activities can be directly comparedacross an array. This typically involves determining the concentrationof products (proteins, nucleic acids, etc.) or reactants (nucleic acids,transcription buffers, translation buffers, etc.) at sites in the arrayand diluting or concentrating the products or reactants appropriately.The dilution/concentration module or module function can form newdiluted arrays or can dilute reactants or products at array sites. Forexample, the dilution/concentration module can re-array amplifiedphysical or logical array of polypeptides or in vitro transcribednucleic acids in a secondary polypeptide or in vitro transcribed nucleicacid array which has an approximately uniform concentration ofpolypeptides or in vitro transcribed nucleic acids at a plurality oflocations in the secondary polypeptide array.

[0262] To be able easily to recover nucleic acids which encode productsof interest, it is generally desirable to limit the number of differentnucleic acids at defined sites in an array. For example, when arrangedin a microtiter tray or other physical array, e.g., for subsequentamplification or processing it is useful to dilute or concentrate arraymembers to an average of approximately 0.1-100 nucleic acids (e.g.,unique nucleic acids) per well or other storage site. This isparticularly relevant at the start of the arraying process followinginitial extraction, mutagenesis or cloning of member nucleic acids.Typically, nucleic acids are arranged at about 1-10, and often at anaverage of approximately 1-10 or 1-5 nucleic acids per well prior toamplification. Subsequent amplification in preparation for arrayduplication can increase this by, e.g., about 2-about 100 fold or more.In contrast, subsequent amplification for purposes of conductingtranscription, translation and/or screening can increase theconcentration of member nucleic acids by, e.g., about >100-fold or more.

[0263] The diluter can operate prior to or after diversity generation orbetween any reaction steps. For example, one embodiment includes adiluter which pre-dilutes one or more shuffled or otherwise diversifiednucleic acids (e.g., by diluting members of a population with a bufferprior to arraying the members, e.g., in the reaction mixture arraysherein). In other aspects, the diluter dilutes nucleic acids as part ofproducing copy arrays from amplified arrays of nucleic acids.

[0264] Typical concentration ranges for diluted nucleic acids are in therange of about 0.01 to 100 molecules per microliter (although, incertain embodiments where lipid vesicles are used as reaction vessels,this concentration can be somewhat different, as described supra).

[0265] Typical dilution/concentration operations are performed by anyavailable method, including the addition of buffers (e.g., bypipetting), lyophilization, osmosis, precipitation, chromatography andthe like.

[0266] In one example, DNA is diluted and aliquotted into wells suchthat the concentration approaches a statistical approximation of thedesired concentration. The DNA is fluorescently labeled, during or afterdiversity generation, followed by FACS or other fluorescence-based cellsorting. The sorting and isolation of individual DNA fragments isoptionally coupled to a dispensing device such as a fraction collectorsuch that a collection array (e.g., microtiter tray) receives about 1molecule/well. The DNA is affinity tagged such that, e.g., one affinitytag exists per molecule. Subsequent binding to an assay vehicle allows asingle dsDNA molecule to bind each compartment in the assay.

[0267] DNA tagging formats include, e.g., 5′ termini DNA/RNA labeling byaminotag phosphoramidites, such as those described in Olejnik et al.(1998) “Photocleavable Aminotag Phorphoramidites for 5′ termini DNA/RNAlabeling” Nucleic Acids Res. 26(15): 3572-3576, in which aphotocleavable amine can be introduced on the 5′ terminal phosphate andconjugated with a variety of amine-reactive markers such as biotin,digoxigenin or tetramethylrhodamine. The assay vehicles forcompartmentalization of affinity tagged dsDNA can bind the DNA to aderivatized microtiter plate directly or to, e.g., beads which aresubsequently dispensed at a rate of, e.g., one bead per well. The boundDNA can be used to isolate hybridizing fragments or other hybridizingshuffled variants.

[0268] More than one DNA fragment can be dispensed into separate wells,with the diversity generation and assaying steps being run as smallpools of samples of interest. In some cases, this partially pooledapproach is preferred, e.g., for assaying larger libraries ofdiversified nucleic acids, or where the cost of reagents (e.g.,transcription/translation reagents) is limiting. However, there are somedrawbacks to this approach, such as a dilution of average activity inthe wells, inhibition of individual pool members by other members in thewells, etc.

[0269] (4.) Processing of Acquired Nucleic Acids to IncreaseDiversity—Fragmentation Based Methods

[0270] As noted, the nucleic acid diversity generation (e.g., shuffling)module can permit hybridization of the nucleic acid fragments followedby elongation with a polymerase which elongates the hybridized nucleicacid. Several (though not all) diversity generation methods relyinitially on the production of fragmented DNA. In general, one or moreshuffled nucleic acid(s) can be produced by synthesizing a set ofoverlapping oligonucleotides, or by cleaving a plurality of homologousnucleic acids to produce a set of cleaved homologous nucleic acids, orboth, and permitting recombination to occur between the set ofoverlapping oligonucleotides, the set of cleaved homologous nucleicacids, or a combined set of overlapping oligonucleotides and set ofcleaved homologous nucleic acids. Fragmented DNA is recombined, e.g.,taking advantage of hybridization and PCR or LCR gene reconstructionmethods described in the references above to produce full-length,diversified recombinant nucleic acid libraries. These libraries areoptionally screened for the expression of products of interest. Thus,the diversity module optionally fragments input nucleic acids to producenucleic acid fragments, or the input nucleic acids can themselvesinclude cleaved or synthetic nucleic acid fragments.

[0271] A number of automated approaches can be used to produce“fragmented” nucleic acids. Fragmented nucleic acids can be provided bymechanically shearing nucleic acids, by enzymatically or chemicallycleaving nucleic acids, by partially synthesizing nucleic acids, byrandom primer extending or directed primer extending double-stranded orsingle-stranded nucleic acid templates, by incorporating cleavableelements into the nucleic acids during synthesis, or the like. Templatesor starting materials for such procedures include naturally occurringnucleic acids, synthetic nucleic acids, DNA in any form, RNA in anyform, DNA analogues, RNA analogues, genomic DNAs, cDNAs, mRNAs, nRNAs,cloned nucleic acids, cloned DNAs, cloned RNAs, plasmid DNAs, viralDNAs, viral RNAs, YAC DNAs, cosmid DNAs, branched DNAs, DNA and/or RNAisolated from heterogeneous microbial populations, catalytic nucleicacids, antisense nucleic acids, in vitro amplified nucleic acids, PCRamplified nucleic acids, LCR amplified nucleic acids, SDA nucleic acids,Qβ-replicase amplified nucleic acids, nucleic acid sequence-basedamplified (NASBA) nucleic acids, transcription-mediated amplified (TMA)nucleic acids, oligonucleotides, nucleic acid fragments, restrictionfragments, combinations thereof and any other available material.Nucleic acids can be partially or substantially purified prior tofragmentation, or can be unpurified.

[0272] For example, nucleic acids can be fragmented enzymatically, e.g.,DNA can be fragmented using a nuclease such as a DNAse. In the contextof the present invention, a fragmentation module can include containerssuch as microtiter plates or microfluidic chips into which parentalrucleic acids (e.g., homologous DNAs) are dispensed, mixed andfragmented by the addition of DNAse. In addition, the fragmentationmodule is optionally operably coupled to a programmed thermocyclerand/or computer for directing fragmentation. For example, a computer isused to calculate conditions for fragmentation that produce desiredlength fragments. For example, when uracil incorporation and cleavage isused to produce nucleic acid fragments, a computer optionally calculatesthe amount of uracil residues to be used in relation to thymidineresidues, e.g., based on user input comprising the desired fragmentlength. The reaction is allowed to proceed for a selected period oftime, or in parallel reactions having different time periods, to produceone or multiple sets of nucleic acid fragments. The addition of DNAse orother cleavage enzymes can occur before or after dispensing the parentalnucleic acids into one o;- more systems which facilitate downstreamprocessing (e.g., prior to dispensing into microwell plates, microchips,or the like). The nucleic acid fragments can be contacted to one anotherin a single pool, or in multiple pools.

[0273] Alternately, or in combination, nucleic acids are mechanicallysheared, e.g., by vortexing, sonicating, point-sink shearing or othersimilar operations, before or after addition to the one or more systemswhich facilitate downstream processing. Mechanical shearing of nucleicacids has the advantage of being largely sequence independent, which, attimes, is desirable, e.g. where no bias is desired in the shearednucleic acid fragments. For example, the point-sink shearing method isdescribed in Thorstenson et al., (1998) “An Automated HydrodynamicProcess for Controlled, Unbiased DNA shearing,” Genome Research 8:848-855. Basically, this method consists of forcing a solution of DNAinto a narrowed region of a channel, putting sufficient force on the DNAto break it up. Although this method typically generated relativelylarge DNA fragments (500-1000 bp), the size of fragments can be reducedby increasing the velocity of the solution, decreasing the size of thechannel, vibrating the channel, e.g., at the channel entrance (e.g.,using a circular piezo-electric device), or the like.

[0274] In a second alternate embodiment, nucleic acids are “fragmented”by synthesis of fragments (rather than cleavage) which correspond insequence to subsequences of one or more parental nucleic acids. Forexample, synthetic oligonucleotide “fragments” can be made in anautomatic synthesizer which correspond to any sequence of interest. Thismethod has the advantage of easy combination with in silico approaches(e.g., in silico recombination of character strings can be performed,followed by synthesis of the oligonucleotides which correspond to anydesired character string). Indeed, the oligonucleotides which aregenerated can provide any desired diversity in products which are formedusing the oligonucleotides—thus, sequence acquisition and at least afirst round of diversity generation can be performed simultaneously.Further details regarding Oligonucleotide synthetic approaches and “insilico” shuffling approaches are found in OLIGONUCLEOTIDE MEDIATEDNUCLEIC ACID RECOMBINATION” by Crameri et al., supra., and “USE OFCODON-BASED OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING” by Welchet al., supra., and “METHODS FOR MAKING CHARACTER STRINGS,POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED CHARACTERISTICS” bySelifonov et al., supra., and further details on these methods are alsofound, supra.

[0275] In a third and also preferred embodiment, DNA fragmentation isachieved via incorporation of cleavage targets into nucleic acids ofinterest. In this embodiment, modified nucleotides or other structuresare incorporated into nucleic acids during synthesis (whether chemical,enzymatic, or both) of the nucleic acids. These modified nucleotides orother structures become cleavage points within a nucleic acid into whichthey are incorporated. One example of this approach is described, e.g.,in PCT US9/619,256. As noted in the '256 application, nucleic acidsynthesis can be conducted to produce nucleic acids of interest (e.g.,via PCR, e.g., using a computer or computer program to calculate theuracil/thymidine ratio necessary to produce nucleic acid fragments of adesired size or synthetic methods), incorporating uracil into thenucleotides in a stochastic or directed fashion. The PCR products arethen fragmented by digestion with UDG-glycosylase, which forms strandbreaks at the uracil residues. Further details on this procedure arefound below.

[0276] Similarly, RNA nucleotides can be incorporated into DNA chains(synthetically or via enzymatic incorporation); these nucleotides thenserve as targets for cleavage via RNA endonucleases. A variety of othercleavable residues are known, including certain residues which arespecific or non-specific targets for enzymes, or other residues whichserve as cleavage points in response to light, heat or the like. Wherepolymerases are currently not available with activity permittingincorporation of a desired cleavage target, such polymerases can beproduced using shuffling methods to modify the activity of existingpolymerases, or to acquire new polymerase activities.

[0277] Simple chain termination methods can also be used to producenucleic acid fragments, e.g., by incorporating dideoxy nucleotides intothe reaction mixture(s) of interest.

[0278] In any case, once fragmentation is performed to the extentdesired, the reaction is transferred to a recombination/resynthesismodule. This module optionally dispenses resulting elongated nucleicacids into one or more multiwell plates, or onto one or more solidsubstrates, or into one or more microscale systems, or into one or morecontainers for further operations by the system.

[0279] In one embodiment, diversity generation module(s) (or any othermodule herein) can include a fragment length purification portion whichpurifies selected length fragments of the nucleic acid fragments.Fragment purification can be performed by electrophoresis (e.g., gelelectrophoresis), column chromatography, incorporation of a label,incorporation of a purification tag, or any other currently availablemethod.

[0280] As noted above, the diversity module also optionally dilutes orconcentrates nucleic acids (e.g., produced by elongation of fragmentpopulations) and dispenses them. For example, elongated nucleic acidsproduced after PCR or ligase-mediated gene reconstruction can bedispensed into one or more multiwell plates or other arrayconfigurations at a selected density per well (or chamber, channel,container, etc., depending on the configuration) of the elongatednucleic acids. This dilution/concentration function is useful innormalizing assay results. That is, having array members at similar (orotherwise defined) concentrations permits analysis of results (e.g.,concentration or activity levels of products). Similarly, where productconcentrations are different, it is useful to dilute or concentrateproducts to similar or at least defined concentrations to facilitateresult interpretation.

[0281] In one embodiment, the device or integrated system includes anucleic acid fragmentation module and a recombination region. Thefragmentation module includes, e.g., a nuclease, a mechanical shearingdevice, a polymerase, a random primer, a directed primer, a nucleic acidcleavage reagent, a chemical nucleic acid chain terminator, anoligonucleotide synthesizer, or other element for producing fragmentednucleic acids as described above. During operation of the device,fragmented DNAs or other nucleic acids produced in the fragmentationmodule, are recombined in the recombination region (a well, channel,chamber or other container or substrate or surface) to produce one ormore shuffled nucleic acids.

[0282] As noted, fragments (or full-length nucleic acids in othermodules herein) are often purified prior for further operations by thesystem. This purification incorporate any of the purification methodscommon to DNA or RNA purification, including electrophoresis (in gels,capillary channels, etc.), chromatography or the like.

[0283] An Improved StEP

[0284] The effectiveness of DNA shuffling by staggered extension process(StEP) depends in certain formats in part on the rapidity ofthermocycling between denaturation and extension steps. Very rapidthermocycling can be used to limit extension. The more limited theextension, the smaller the resulting fragments and the finer the“granularity” of the resulting recombination. Controlled incorporationof uracil into parental templates with uracil glycosylase to generate APsites are used to provide an alternate method of controlling fragmentsize. The granularity of recombination is controlled, e.g., by thefrequency of apurinic sites in parental templates, as these sites serveas replication terminators in the StEP reaction. A further improvementuses a thermostable uracil glycosylase and dUTP in the StEP reaction toadd replication terminators to newly synthesized DNA fragments, assuringrecombination throughout the StEP reaction.

[0285] Fragmentation Example: Ung-End Fragmentation: Use in Single-TubeDNA Shuffling Reactions

[0286] This example describes single-tube DNA shuffling according to thepresent invention including simplification of DNase enzymaticfragmentation, size fractionation and purification of DNA by agarose gelelectrophoresis or other procedures. An alternative to laborious andhard-to-control standard fragmentation protocols includes the use ofcontrolled uracil incorporation into starting DNA, e.g., via PCR withdUTP, followed by fragmentation of the uracil-containing DNA with twoenzymes: Uracil N-Glycosylase (Ung) which hydrolyzes the n-glycosidicbond between the deoxyribose sugar and uracil to generate apurinic (orAP) sites, followed by the use of a 5′ AP endonuclease, such asEndonuclease IV (End) which cleaves a single strand of DNA 5′ to APsites, leaving a 3′-hydroxy-nucleotide and 5′-deoxyribose phosphatetermini. See also, Freidberg et al. (1995) DNA Repair and Mutagenesis.pp. 1-698. ASM Press. Washington, D.C.

[0287] A fundamental advantage of Ung-End fragmentation over DNAse Itreatment, is that fragmentation is simply a function of uracil content(which is easily controlled in PCR), rather than time of reaction andsize of DNA (which is difficult to control). Size fractionation andpurification may be obviated by the use of Ung-End fragmentation, sincethe reaction goes to completion, with the average fragment size being afunction of uracil content only. Note that, as with conventional DNasefragmentation and size fractionation, Ung-End fragmentation is used forshuffling a single DNA sequence or family of related DNA sequences. Theuse of Ung-End fragmentation along with PCR assembly provides forsingle-tube DNA shuffling, which can be carried out, e.g., in microtiterplates.

[0288] Important considerations in the design of a single-tube shufflingreaction include methods for minimizing carry-over of the plasmidtemplate DNA used to generate uracil-containing DNA for shuffling. Asimple solution is to incorporate uracil into the plasmid template viagrowth in a dut-1 ung-1 double mutant of Escherichia coli, such asstrain CJ236 (Warner et al. (1981) “Synthesis and metabolism ofuracil-containing deoxyribonucleic acid in Escherichia coli” J.Bacteriol. 145(2): 687-695; Kunkel et al. (1987) “Rapid and efficientsite-specific mutagenesis without phenotypic selection” Methods Enzymol.15: 367-382) or by PCR. Likewise, incorporation of uracil into primersfor generating uracil-containing DNA minimizes carry-over of primersinto the assembly reaction. Reduction in transformation efficiency ofshuffled product using Ung-End fragmentation can result due to residualuracil. Where this is problematic, transformation of shuffled productsinto an ung mutant of E. coli assists in cloning processes.

[0289] Growth of plasmid in a dut-1 ung-1 E. coil mutant (e.g. strainCJ236) for uracil incorporation followed by Ung-End fragmentation andPCR assembly provides a quick, single-tube method of shuffling a wholeplasmid or family of plasmids. Growth of plasmids in an E. coli dut ungstrain bearing a strong mutator allele (e.g. dut ung mutD5) orcombination of mutator alleles for in vivo mutagenesis, as well as,uracil incorporation into plasmid DNA coupled with Und-End fragmentationand PCR assembly is a powerful and simple means of rapidly evolving thefunction of a plasmid. Uracil content of plasmid DNA (and consequentlyaverage fragment size following Ung-End fragmentation) following growthin a dot ung strain is modulated by the addition of exogenous uridine orthymidine. In addition, uracil content is effected using strains bearingalternative dut and/or ung alleles, such as the leaky dut-4 allele forless frequent uracil incorporation (Hays et al. (1981) “Recombination ofuracil-containing Lambda bacteriophages” J. Bacteriol. 145(1): 306-320)or be using other alleles which effect cellular dUTP levels or uracilincorporation or removal from DNA. Also, plasmid multimerizationgenerated by Und-End fragmentation and PCR assembly of uracil-containingplasmid can be directly transformed into naturally competent bacteria,such as Bacillus subtilus 168 derivatives, which are more efficientlytransformed by plasmid multimers.

[0290] Note that uracil glycosylases and 5′ AP endonucleases areubiquitous. They have been characterized in both eukaryotic andprokaryotic cells, as well as viruses (Freidberg et al. (1995) DNARepair and Mutagenesis. pp. 1-698. ASM Press. Washington, D.C.). Many ofthese can be used for Ung-End fragmentation.

[0291] In addition to cleaving 5′ to AP sites, AP nucleases (such asExonuclease III, Endonuclease IV, and Endonuclease V) recognize andcleave DNA at sites damaged by oxidizing agents or alkylating agents.Endonculease V additionally cleaves DNA at A/C and A/A mismatches and atdeoxyinosine. Thus, the use of controlled dITP incorporation (e.g.,during oligonucleotide synthesis used in construction of the nucleicacid of interest) and Endonuclease V treatment enables a single enzymemethod for DNA fragmentation. Reagents and bacterial strains for Ung-Endfragmentation can easily be incorporated along with PCR reagents into asimple DNA shuffling kit.

[0292] Amplifications with Decreasing Uracil Concentrations:

[0293] The following protocol provides an illustrative example ofperforming amplifications at multiple Uracil concentrations. In anautomated process, e.g., in an integrated diversity generation device,appropriate uracil concentrations are optionally calculated, e.g., basedon empirical data, to produce a desired fragment length and optimizediversity generation. For example a programmed thermocycler isoptionally used to create appropriate nucleic acids for shuffling, e.g.,having a desired amount of uracil incorporation. The programmedthermocycler can be operably coupled to a fragmentation device thatproduces fragments of a desired length from the uracil containingnucleic acids. The fragments are then used to generate diverse nucleicacids.

[0294] First, 50 μl 10 mM dUTP Stock Mixtures are prepared for a dUTPtitration. 100 mM dNTPs stocks are prepared as follows: 10 mM 8 mM 6 mM4 mM 2 mM 0 mM dUTP dUTP dUTP dUTP dUTP dUTP 100 mM 5 5 5 5 5 5 dGTP 100mM 5 5 5 5 5 5 dCTP 100 mM 5 5 5 5 5 5 dATP 100 mM 0 1 2 3 4 5 dTTP 100mM 5 4 3 2 1 0 dUTP smp H₂O 30 30 30 30 30 30

[0295] /100 μl /800 μl smp H20   45 μl   360 μl 3.3 X TthXL Buffer 33264 25 mM MgOAC 10  60 10 mM dNTP Mix  4  32 20 pmol/μl Protease Forward  2.5  20 20 pmol/μl Protease Reverse   2.5  20 ˜100 ng/μl plasmidp3RcCl12  1  6 (XL1-Blue) 2 U/μl TthXL  2  16

[0296] Third, Reaction Mixes are prepared with all components except thedNTP Mix. 96 μl of Reaction Mix are aliquoted into, e.g., 6 PCR tubes. 4μl of each of the dNTP Mixes are added to samples of Reaction Mix. Thetubes are placed in a Stratagene RoboCycler using the followingsettings:  1x 2 min @ 94° C. 30 sec @ 50° C. 1 min @ 72° C. 29x 30 sec @94° C. 30 sec @ 5O° C. 1 min @ 72° C.

[0297] Finally, 10 μl of each amplification is run on a standard 0.7%Agarose/TBE gel or other separation system.

[0298] Enzymatic Treatment with Uracil N-Glycosylase and/or EndonucleaseIV:

[0299] 10 μl of the 0.32, 0.24, 0.16, and 0.0 mM dUTP reactions arealiquoted into 4 wells of a PCR strip. No enzyme is added to the firstaliquot, 0.5 μl of 1 U/μl HK™-UNG N-Glycosylase (Epicentre Technologies)to the second, 0.5 μl of 2 U/μl E. coil Endonuclease IV (EpicentreTechnologies) is added to the third aliquot, and 0.5 μl of each enzymeis added to the fourth aliquot. The reactions are Incubated for 2 hoursat 37° C. The reactions are then heated for 10 min at 94° C., thenplaced on ice. 10 μl of each reaction are then run on a 1.5% Agarose/TBEgel.

[0300] Assembly of Fragments:

[0301] Uracil titrations and 100 μl amplifications are repeated togenerate more test DNA. The QIAGEN QIAquick PCR Purification Kit is usedto remove primers and unused dNTPs from reactions according to QIAGEN'sinstructions, eluting with 55 μl of smp water. The following is added toall 6 pcr reactions to bring to 100 μl total volume: /100 μl smp water 7μl Reaction in smp water 50 3.3 X TthXL Buffer 33 25 mM MgOAc 10

[0302] To 50 μl of each of the 6 reactions, 2.5 μl of 1 U/ml HK™⁻UNGN-Glycosylase and 2 U/μl E. coil Endonuclease IV is added. The reactionsare incubated for 2 hours at 37° C., then for 10 min at 94° C., and thencooled to 4° C. in a Thermocycler. Untreated reactions are saved foragarose gel analysis. 25 μl of each reaction is removed and saved foragarose gel analysis. To the remaining 25 μl, 25 μl of the followingAssembly Mix is added: /100 μl /200 μl smp water 4 45 μl 90 33 X TthXLBuffer 33 66 25 mM MgOAc 10 20 10 mM dNTP Mix (no Ura) 80 16 2 U / μlTthXL 4  8

[0303] The reactions are placed in a Stratagene RoboCycler using thefollowing settings:  1x 2 min @ 94° C. 30 sec @ 50° C. 1 min @ .72° C.29x 30 sec @ 94° C. 30 sec @ 50° C. 1 mm @ 72° C.

[0304] For each uracil concentration, 10 μl of the original PCR reactionis run, 10 μl of fragments, and 10 μl of assembly reaction on a 1.5%Agarose/TBE gel.

[0305] Fragments from the assembly reaction are rescued using PCR withnested primers in 100 μl reactions.

[0306] Ung-End Fragmentation of E. coli dut ung Grown Plasmid DNA:

[0307] Electrocompetent E. coli strain CJ236 (pCJ105 (Cam^(r) F′)// dut1ung1 thi-1 relA1) is prepared as follows. Strain CJ236 is Streaked onLB+30 μg/ml chloramphenicol and incubated overnight at 37° C. Cells arescraped from the plate into 5 ml LB and inoculated into 250 ml LB to astarting OD₆₀₀ of 0.100. The culture is shaken at 37° C. The culture isplaced on ice for ^(˜)30 min when at OD₆₀₀ 0.4-0.5 and prepared viastandard electrocompetence procedures, freezing in 220 μl aliquots in10% Glycerol.

[0308] Transformation of strain CJ236 with plasmid is performed asfollows. 0.5 mg of plasmid are added into 100 ml of electrocompetentstrain CJ236 via standard a electroporation protocol. 10⁻¹ to 10⁻⁴dilutions are plated on LB+100 μg/μl Ampicillin and incubated overnightat 37° C. A transformation efficiency of about 2×10⁸ transformants/μgplasmid are observed. 8 transformants are patched to an LB+Amp100 stockplate and incubated overnight at 37° C. CJ236 in inoculated into 3 ml LBBroth+Amp 100, unsupplemented, and supplemented with 500 μg/ml Uradine(to see if fragment size is modulated by supplementation). The culturesare shaken overnight at 37° C. Plasmid DNA is prepared from 1.5 ml withthe aid of a Qiagen Miniprep Spin Kit, suspending plasmid DNA in 50 μlsmp water. A₂₆₀ and A₂₈₀ of a 1:20 dilution in smp water is read andquantitated. Plasmid in CJ236 in LB+Amp100=0.34 μg/μl; plasmid in CJ236in LB+Amp100+Ura500=0.35 μg/μl; plasmid in XL1-Blue in LB+Amp100=0.7μg/μl.

[0309] Fragmentation Example: Automated DNA Fragmentation UsingDNase-Plastic Co-polymers

[0310] Fragmentation is currently performed by the addition of DNasel toDNA in solution. This can result in variable fragmentation. For example,PCR products are often fragmented less well than plasmids, presumably asa result of residual salts following purification of the PCR product.This example provides an automated process in which DNA is fragmentedand specific sized fragments are purified, speeding the process greatly.

[0311] Immobilized DNase on support resin beads can be used forfragmentation, with DNA to be fragmented passing over a column made ofthe beads. This avoids the problem of salts in the solution which areremoved by gel filtration.

[0312] An extension of this procedure is to encapsulate the DNase in apolymeric (plastic) resin. Wang et al. (1997) “Biocatalytic plastics asactive and stable materials for biotransformations” Nat Biotechnol 2:15(8): 789-93 and the references therein describe Biocatalytic plastictechnology generally. Resin encapsulation has the advantage ofstabilizing the enzyme greatly: no loss of activity is seen even after30 or more days. Synthesis of a stable DNase resin avoids the need tore-calibrate the column to account for the loss of activity. Using afixed initial concentration of DNA, DNA fragment size can be determinedby the flow rate through the column. Fractions can be collectedcontaining known fragment sizes.

[0313] Encapsulated DNAse resin can then be used as a component of anautomated DNA shuffling system as set forth herein. That is,fragmentation can be performed in a flowing fashion, across DNAse orother nuclease columns. This flow-through fragmentation can be performedin an “in line” or “off-line” fashion. For example, the columns can beincorporated into the fluid handling module(s) herein and performed aspart of a fluid transfer of material to be fragmented (in linefragmentation). Alternately, fragmentation columns can be a separatemodule in the system.

[0314] Although described above in terms of columns for purposes ofillustration, it will be appreciated that non-column based methods canutilize particle-bound or encapsulated nucleases, e.g., in a beadpanning or chip-based format.

[0315] (5.) Recombination/Resynthesis/Amplification Module

[0316] The recombination resynthesis module permits hybridization ofcomplementary (or partially complementary) nucleic acids, followed byPCR-based resynthesis of hybridized nucleic acids, typically usingmultiple cycles of PCR (a variety of PCR-based re-synthesis methods,including staggered extension process (“StEP”) PCR are set forth in thereferences above), or ligation (e.g., via LCR). In general, PCR can beused to “sew” sets of overlapping nucleic acids together, simply byperforming multiple cycles of PCR on overlapping nucleic acid fragments.Similarly, ligases can be used to ligate overlapping (or evennon-overlapping) nucleic acid fragments (with or without apolynucleotide extension (e.g. polymerase-mediated) step between cyclesof ligation). Where PCR is used, the recombination/resynthesis modulealso optionally performs nucleic acid amplification, i.e., by PCR.

[0317] The amplification of arrays and duplicate arrays is also animportant feature of the invention, as this amplification providesmaterial for subsequent operations (2^(nd) round diversity generationreactions such as shuffling, cloning, sequencing, etc.). For example aduplicate amplified array can be formed by copying a master array, or aportion thereof, and generating amplicons of the members of theresulting duplicate array to form an amplified array of nucleic acids.Any available amplification methods can be used, including amplifyingnucleic acids in physical or logical arrays by PCR, LCR, SDA, NASBA,TMA, Qβ-replicase amplification, etc.

[0318] Common physical elements for the resynthesis module includeheating and optionally cooling elements to perform PCR, containers tohold nucleic acids to be resynthesized (microtiter trays, chips, testtubes, etc.). For example, standard PCR thermocyclers can beincorporated into this module, i.e., in combination with appropriateinstruction sets to perform synthesis recombination and amplification.For example, a set of instructions is optionally embodied in aprogrammed thermocycler, a computer operably coupled to a thermocycler,or in a web page that can be used to instruct a thermocycler. The set ofinstructions typically receives user input data and sets up cycles to beperformed on the thermocycler, e.g., a programmed thermocycler. The userinput data typically includes one or more parental nucleic acidsequence, a desired crossover frequency, an extension temperature,and/or an annealing temperature, and the like. From such user inputdata, a set of instructions. e.g., embodied in a computer readablemedium, creates a cycle which is performed by the programmedthermocycler. For example, a set of instructions optionally sets up acycle to amplify one or more parental nucleic acid sequence and fragmentthe one or more parental nucleic acid sequence to produce one or morenucleic acid fragment. In some embodiments, the cycle is programmed orinstructed to pause before fragmenting to allow the addition offragmentation enzymes, e.g., to fragment nucleic acids that have haduracil residues incorporated therein. The fragments are then reassembledto produce one or more shuffled nucleic acid; which is optionallyamplified, all according to the set of instructions or calculations.

[0319] Amplifiers typically include some sort of heating element and canalso include a cooling element. Such elements commonly include (but arenot limited to) resistive elements, programmable resistors,micromachined zone heating chemical amplifiers, Peltier solid state heatpumps (see, e.g., http://pw1.netcom.com/˜sjnoll/peltier.html), heatpumps, resistive heaters, refrigeration units, heat sinks, JouleThompson cooling devices, a heat exchanger, a hot air blower, etc. Anyof the above elements are optionally operably coupled to a computercomprising a set of instructions which directs or instructs the elementsin the amplification process, e.g., according to user input data orcomputer calculated predictions.

[0320] Recently, attempts have been made to shorten the time requiredfor each cycle of PCR, an advantage in the present method, in thatreduction in this time increases the overall throughput of the system.Such methods often reduce the time by, for example, performing the PCRin devices that allow rapid temperature changes. The use of apparatusthat allow greater heat transfer, e.g. incorporating thin-walled tubes,turbulent air-based machines, and the like also facilitate the use ofshorter cycle times. For example, the RapidCycler™, from IdahoTechnologies, Inc. (http://www.idahotech.com/Salt Lake City, Utah)allows relative rapid ramping times between each temperature of a PCRand relatively efficient thermal transfer from the cycler to thesamples. Similarly, the RAPID (Ruggedized Advanced PathogenIdentification Device) from Idaho Technologies, Inc. provides a thermalcycler with concurrent fluorescence monitoring to speed analysis aswell.

[0321] As an alternative or adjunct to standard PCR thermocyclicelements, chip-based PCR can also be incorporated into the presentinvention. A recent example of chip-based PCR was discussed by Kopp etal. (1998) “Chemical Amplification: Continuous Flow PCR on a Chip”Science 280: 1046-1047. Kopp et al. describe a microfluidic continuousflow PCR system where the PCR reactants were flowed through a chiphaving three discrete temperature zones. The reagents within the channelunderwent essentially instantaneous changes in temperature. Thus, thecycle time in this system reflected the time at each temperature, withno substantial temporal contribution from ramping times.

[0322] Additional chip-based PCR methods are set forth in U.S. Pat. No.5,587,128 to Wilding et al. Dec. 24, 1996 “MESOSCALE POLYNUCLEOTIDEAMPLIFICATION DEVICES”) which similarly incorporate hot zones and fluidflow to achieve temperature cycling. PCR can also be performed by fluidresistance heating in microchips. For example, U.S. Pat. No. 5,965,410,to Chow, et al., Oct. 12, 1999, “ELECTRICAL CURRENT FOR CONTROLLINGFLUID PARAMETERS IN MICROCHANNELS” describe such devices.

[0323] In certain embodiments, non-thermocyclic polymerase mediatedamplification can be achieved, i.e., using a chemical denaturationdevice or an electrostatic denaturation device. For example U.S. Pat.No. 5,939,291 by Loewy et al., Aug. 17, 1999 “MICROFLUIDIC METHOD FORNUCLEIC ACID AMPLIFICATION” describes such devices. This invention canalso be used with polymerases capable of performing under unusual orbiochemically challenging environments such as are created under extremeshear forces, temperatures, salt concentrations, or the presence of oneor more non-aqueous solvents and other chemicals. Such enzymes may begenerated via the shuffling and mutagenesis techniques disclosed hereand elsewhere in the art.

[0324] (6.) PCR Amplification of Individual Fragments

[0325] It is generally preferable to amplify diversified nucleic acidsby PCR or any of the other amplification techniques herein prior to anin vitro transcription and translation step. This is desirable becausesingle copy genes can become damaged or otherwise compromised during thecourse of the transcription/translation or assay steps, making rescue ofthe genetic material problematic. Also, PCR amplification of a singlegene copy can be suboptimal, although it is known to be possible (Ohuchiet al. (1998) “In vitro Generation of protein libraries using PCRamplification of a single DNA molecule and coupled proteintranscription/translation,” Nucleic Acids Res. 26(19):4339-4346). Thetrue number of starting genes in each reaction can be estimated usingquantitative PCR. Such quantification involves, e.g., imaging of theamplified products via methods involving fluorescence detection,fluorescence resonance energy transfer, autoradiography,chemilumniescence or visible dyes.

[0326] (7.) Measuring Diversity/Library Quality Module

[0327] The diversity generation module can include a nucleic aciddeconvolution module (or this module can exist separately to identifynucleic acids in other portions of the system). For example, thediversity generation module can include an identification portion, whichidentifies one or more nucleic acid portion or subportion.

[0328] A variety of nucleic acid deconvolution methods can be used,including nucleic acid sequencing, restriction enzyme digestion, dyeincorporation and the like. The module can determine a recombinationfrequency (e.g., by dye incorporation, labeled nucleotide incorporation,sequencing, restriction enzyme digestion, rescue PCR, etc.) or a lengthof product (by any molecular sizing technology, or by dye incorporation,nucleotide incorporation, sequencing, restriction enzyme digestion,rescue PCR, etc.), or both a recombination frequency and a length, forthe resulting elongated nucleic acids. Detection can be by detectinglabels associated with nucleic acid products (e.g., detection of a dye,radioactive label, biotin, digoxin, a fluorophore, etc.), or simply bydetecting the nucleic acid directly. Secondary assays such asfluorogenic 5′ nuclease assays can be used for detection. For example,the extent of PCR amplification can be determined by incorporation of alabel into one or more amplified elongated nucleic acid, a fluorogenic5′ nuclease assay, TaqMan, FRET, etc.

[0329] In general, an important factor in producing diverse nucleicacids in the diversity generation module(s) is the ability to measurethe diversity which is generated. For example, if there is limitedrecombination in a shuffling reaction, the library of nucleic acidswhich is produced is often not sufficiently diverse for optimalscreening of an activity of interest. Thus, in preferred embodiments,the shuffling module assesses the degree of diversity, generally beforeany screening is performed.

[0330] Diversity assessment can be performed in a number of ways.Aliquots of diverse populations of nucleic acids can be cloned oramplified (e.g., via standard primers which provide for amplification ofall or least some members of the pool) by limiting dilution. Thesenucleic acids can then be sequenced, e.g., using automated sequencingmethods and apparatus. The diversity of the population is then assessed,e.g., using sequence alignment algorithms, by visual inspection, or thelike. Pools which are determined to be diverse can then be selected foractivity of interest, used as substrates in additional recombinationreactions, or the like.

[0331] Sometimes it is possible to make a determination, or anapproximation, of diversity,without having to sequence members of thepopulation of nucleic acids. For example, a rescue PCR or LCR reactioncan be performed that is designed to preferentially amplify recombinednucleic acids. In such rescue reactions, rescue PCR or LCR primers areprovided which correspond to a subset (and, occasionally, only one) ofthe original parental nucleic acids that were acquired as noted above.By performing combinatorial PCR or LCR reactions using such primers, itis possible to determine whether recombination has taken place betweentwo or more parental nucleic acids. That is, nucleic acids which areproduced are optionally only amplified in the rescue PCR or LCR processif they have sequences corresponding to two or more parental nucleicacids (excluding PCR/LCR control reactions). Recombination events aredetected for using appropriate combination of primers in the rescuereaction.

[0332] PCR/LCR products can be detected in solution, eliminating theneed for separation or sequencing (although these approaches can beused, if desired, to provide more complete information of what sequencesare rescued). For example, the amount of double-stranded DNA in therescued pool provides an indication as to whether a PCR/LCR wassuccessful. Thus, If there is double-stranded DNA following a rescuePCR/LCR amplification on a subset of the pool, then it is likely thatthe assembly reaction worked properly, producing recombinant nucleicacids. Simply monitoring double-strand DNA specific dye incorporation ina PCR/LCR rescue reaction provides at least a first approximation of theefficiency of the fragmentation and reassembly process.

[0333] For example, the PicoGreen dsDNA quantitation reagent (availablee.g., from Molecular Probes) can be used to monitor and quantitatedsDNA. Similarly, the OliGreen ssDNA reagent can be used to monitor andquantitate ssDNA (including oligonucleotides) and the RiboGreen RNAquantitation reagent can be used to monitor RNA. See, e.g., Haugland(1996) Handbook of Fluorescent Probes and Research Chemicals SixthEdition by Molecular Probes, Inc. (Eugene Oreg.) andhttp://www.probes.com/handbook (the on-line 1999 version of the Handbookof Fluorescent Probes and Research Chemicals Sixth Edition by MolecularProbes, Inc.) (Molecular Probes, 1999). For example, Molecular Probes1999, Chapter 8 (e.g., section 8.2) provides details regardingquantitation of DNA in solution.

[0334] The PicoGreen reagent (e.g., Molecular Probes Nos. P-7581,P-11495) and Kit (Molecular Probes Nos. P-7589, P-11496) accuratelyquantitate as little as 25 pg/mL of double-stranded DNA (dsDNA) in afluorometer or 250 pg/mL (typically 50 pg in a 200 μL volume) in afluorescence microplate reader. The PicoGreen assay is greater than10,000 times more sensitive than conventional UV absorbance measurementsat 260 nm (an A260 of 0.1 corresponds to a 5 μg/mL dsDNA solution).Although the PicoGreen reagent is not actually specific for dsDNA, itshows a >1000-fold fluorescence enhancement upon binding to dsDNA, andless fluorescence enhancement upon binding to single-stranded DNA(ssDNA) or RNA, making it possible to quantitate dsDNA in the presenceof ssDNA, RNA, proteins or other materials. Thus, the PicoGreen reagentallows direct quantitation of PCR amplicons without purification fromthe reaction mixture and makes it possible to detect low levels of DNAcontamination in recombinant protein products.

[0335] Protocols for the PicoGreen assay are amenable to high-throughputscreening in the systems herein. The dye is added to the sample (e.g.,in a microtiter tray) and incubated for about five minutes, and then thefluorescence is measured. In addition, the fluorescence signal frombinding of the PicoGreen reagent to dsDNA is linear over at least fourorders of magnitude with a single dye concentration. Linearity ismaintained in the presence of several compounds commonly found innucleic acid preparations, including salts, urea, ethanol, chloroform,detergents, proteins and agarose.

[0336] For detecting oligonucleotides and other ssDNA the OliGreen ssDNAquantitation reagent from Molecular Probes (No. O-7582) and/or (No.O-11492) can be used). The OliGreen ssDNA quantitation reagent enablesquantitatation of as little as 100 pg/mL of ssDNA—200 pg in a 2 mL assayvolume with a standard fluorometer or 200 pg in a 200 μL assay volumeusing a fluorescence microplate reader. Thus, quantitation with theOliGreen reagent is about 10,000 times more sensitive than quantitationwith UV absorbance methods and at least 500 times more sensitive (andfar faster, with a greater throughput) than detecting oligonucleotideson electrophoretic gels stained with ethidium bromide.

[0337] The OliGreen ssDNA quantitation reagent does exhibit fluorescenceenhancement when bound to dsDNA and RNA. Like the PicoGreen assay, thelinear detection range of the OliGreen assay in a standard fluorometerextends over four orders of magnitude—from 100 pg/mL to 1 μg/mL—with asingle dye concentration. The linearity of the OliGreen assay is alsomaintained in the presence of several compounds commonly found tocontaminate nucleic acid preparations, including salts, urea, ethanol,chloroform, detergents, proteins, ATP and agarose (see, e.g., theOliGreen product information sheet from Molecular Probes); however, manyof these compounds do affect signal intensity, so standard curves aretypically generated using solutions that closely mimic those of thesamples. The OliGreen reagent shows a large fluorescence enhancementwhen bound to poly(dT) but only a relatively small fluorescenceenhancement when bound to poly(dG) and little signal with poly(dA) andpoly(dC). Thus, it is helpful to use an oligonucleotide with similarbase composition when generating a standard curve for concentrationdependence. The OliGreen ssDNA quantitation reagent can be used forquantitation of antisense oligonucleotides, aptamers, genomic DNAisolated under denaturing conditions, LCR/PCR primers, phosphorothioateand phosphodiester oligodeoxynucleotides, sequencing primers,single-stranded phage DNA, etc.

[0338] Other dyes such as the Cyanine Dyes and Phenanthridine Dyes canalso be used for Nucleic Acid Quantitation in Solution and are,therefore, adaptable to use in the present invention. See, MolecularProbes, Supra, for a discussion of these and many other nucleic acidstaining and quantitation dyes.

[0339] In one embodiment, a real time PCR assay system such as the“TaqMan” system is used for library quality determinations. Real timePCR product analysis by, e.g., FRET or TaqMan (and related real timereverse-transcription PCR) is a known technique for real time PCRmonitoring that has been used in a variety of contexts (see, Laurendeauet al. (1999) “TaqMan PCR-based gene dosage assay for predictive testingin individuals from a cancer family with INK4 locus haploinsufficiency”Clin Chem 45(7):982-6; Laurendeau et al. (1999) “Quantitation of MYCgene expression in sporadic breast tumors with a real-time reversetranscription-PCR assay” Clin Chem 59(12):2759-65; and Kreuzer et al.(1999) “LightCycler technology for the quantitation of bcr/ab1 fusiontranscripts” Cancer Research 59(13):3171-4. Examples of theseembodiments are set forth in more detail in the two following examples.

[0340] Example: Parallel Determination of Family Library Quality withoutCloning or Sequencing

[0341] A significant rate limiting step in the creation of a shuffledlibraries is the determination of library quality. Since chimeraformation depends on multiple parameters (fragment size, gene size, GCcontent, annealing temperature, extension temperature, number ofparents, homology between parents) it is difficult to predict theconditions required for a certain crossover frequency.

[0342] An alternative to complete control of the shuffling process is togain precise control (i.e. for reproducibility) over importantparameters (such as fragment size, annealing and extension temperatures,parental representation etc) and then to make multiple libraries inwhich these are systematically varied, e.g., in a microtitre plateformat. The problem then is how to assess rapidly the quality of theselibraries without the labor-intensive and costly processes of cloningand sequencing.

[0343] There are two common determinants of shuffled libraries: thefrequency of recombination used to produce the library members, and thefrequency with which frame shifts or deletions prevent the synthesis offull-length protein.

[0344] The TaqMan system (Perken Elmer Biosystems) provides one exampleof available technology that can be adapted to address these problems.TaqMan is a real-time PCR detection system that works as follows. Twooligonucleotides are used as amplification primers, e.g., about two orthree hundred bases apart. A third primer, complementary to a section ofDNA between these primers, is labeled with a fluorescent dye and afluorescence quencher. During PCR, the third oligonucleotide anneals tothe single stranded product DNA, and is then degraded by the 5′ to 3′exonuclease activity of the polymerase as it extends through the regionto which the labeled oligonucleotide is annealed. Degradation of thelabeled oligonucleotide separates the fluorescent dye from the quencher,resulting in an increase in fluorescence. The cycle number at which anincrease in fluorescence appears indicates the abundance of a particulartemplate.

[0345] The TaqMan system can be adapted to measure the abundance ofvarious chimeras in a microtiter format. Varying the primers andindicator oligonucleotides used allows detection of different classes ofchimeras (see, FIG. 9). A simple tiered screen can used in whichlibraries are first screened for the presence of a fragment of B or C,incorporated between two fragments of A. Libraries that score well inthis test could then be tested for more complex chimera arrangements.Finally the best few (5 or so) libraries are cloned into atranslational-coupling vector, and full-length variants are picked,screened and sequenced. This, in turn, generates feedback about thetypes of chimeras that are the best indicators for a specific function,and the relationship between the simple chimera indicator described hereand the real sequences generated.

[0346] As shown in FIG. 9, a labeled B oligo can be used to measure therelative differences of, e.g., 8 possible crossovers. Alternately,several different fluorescently labeled oligos can be used in the samewell of a reaction tray or other container. In this scheme, a library istested by amplifying with a specific primer and fluorescence of A, B andC for different indicator dyes are measured as a function of the numberof cycles (e.g., PCR cycles). This gives an indication of the frequencyof the types of crossovers present in the library sample, illustratedschematically.

[0347] This kind of library screening dramatically increases thethroughput for library assessment as compared to previous methods.

[0348] An alternative to TaqMan is the use of molecular beacons toassess library quality. Molecular beacons are oligonucleotide probesthat can report the presence of specific nucleic acids in homogeneoussolutions (Tyagi and Kramer (1996) “Molecular beacons: probes thatfluoresce upon hybridization.” Nat Biotechnol 14, 303-308. They are usedfor real-time monitoring of PCR or other amplification reactions and forthe detection of RNAs within living cells. Molecular beacons arehairpin-shaped molecules with an internally quenched fluorophore whosefluorescence is restored when they bind to a target nucleic acid (seeTyagi and Kramer, id). They are designed so that the loop portion of themolecule is a probe sequence complementary to a target nucleic acidmolecule. The stem is formed by an annealing of complementary armsequences on the ends of the probe sequence. A fluorescent moiety isattached to the end of one arm and a quenching moiety is attached to theend of the other arm. The stem keeps these two moieties in closeproximity to each other, causing the fluorescence of the fluorophore tobe quenched by energy transfer. When the probe encounters a targetmolecule, it forms a hybrid that is longer and more stable than the stemhybrid and its rigidity and length preclude the simultaneous existenceof the stem hybrid. Thus, the molecular beacon undergoes a spontaneousconformational reorganization that forces the stem apart, and causes thefluorophore and the quencher to move away from each other, leading tothe restoration of fluorescence which can be detected. Further detailson Molecular Beacons and their use can be found athttp://www.molecular-beacons.org and in the following references: Tyagiet al. (1998) “Multicolor molecular beacons for allele discrimination”Nat Biotechnol 16:49-53; Matuso (1998) “In situ visualization of mRNAfor basic fibroblast growth factor in living cells” BiochimicaBiophysica Acta 1379:178-184; Sokol et al. (1998) “Real time detectionof DNA-RNA hybridization in living cells” Proc Natl Acad Sci USA95:11538-11543; Leone et al. (1998) “Molecular beacon probes combinedwith amplification by NASBA enable homogeneous, real-time detection ofRNA” Nucleic Acids Res 26, 2150-2155; Piatek et al. (1998) “Molecularbeacon sequence analysis for detecting drug resistance in Mycobacteriumtuberculosis” Nat Biotechnol 16:359-363; Kostrikis et al. (1998)“Spectral genotyping of human alleles” Science 279:1228-1229; Giesendorfet al. (1998) “Molecular beacons: a new approach for semiautomatedmutation analysis” Clin Chem 44:482-486; Marras et al. (1999) “Multiplexdetection of single-nucleotide variations using molecular beacons” GenetAnal 14:151-156; and Vet et al. (1999) “Multiplex detection of fourpathogenic retroviruses using molecular beacons” Proc Natl Acad Sci USA96:6394-6399.

[0349] Thus, the presence or absence of any specific nucleic acid(including any mutated nucleic acid) can be monitored in real time viathe use of Molecular Beacons.

[0350] Example: Monitoring of Recombination Using Fluorescence EnergyTransfer

[0351] After performing a diversity generation reaction, an extensiveanalysis of the library can be performed to check whether there wasrecombination between genes (or other nucleic acids) and at whatfrequency. An immediate answer to those question speeds up theconstruction of the relevant libraries. Furthermore, if the monitoringis continuous during the shuffling reaction, the conditions can bechanged to optimize recombination, even before the end of the reaction.

[0352] The process in this example utilizes real time PCR analysis basedupon FRET. The method uses “light cycler” techniques (De Silva et al(1998) Biochemica “Rapid Genotyping and Quantification withHybridization Probes Rapid Genotyping and Quantification on theLightCycler with Hybridisation Probes” 2:12-15, and De Silva et al(1998) Biochemica “The LightCycler-The Smartest Innovation for MoreEfficient PCR” Biochemica 2: 4-7).

[0353] Fluorescent resonance energy transfer (FRET) is a distancedependent excited state interaction in which emission of one fluorophoreis coupled to the excitation of another which is in proximity (closeenough for an observable change in emissions to occur). Some excitedfluorophores interact to form excimers, which are excited state dimersthat exhibit altered emission spectra (e.g., phospholipid analogs withpyrene sti-2 acyl chains); see, Haugland (1996) Handbook of FluorescentProbes and Research Chemicals, Published by Molecular Probes, Inc.,Eugene, Oreg., e.g., at chapter 13).

[0354] The Forster radius (R_(o)) is the distance between fluorescentpairs at which energy transfer is 50% efficient (i.e., at which 50% ofexcited donors are deactivated by FRET. The magnitude of R_(o) isdependent on the spectral properties of donor and acceptor dyes:R_(O)=[(8.8×10²³)(K²)(n⁻⁴)(QY_(D))(J)(S)]^(1/6) Å, where: K²=dipoleorientation range factor (range 0 to 4, K²=⅔ for randomly orienteddonors and acceptors); QY_(D)=fluorescence quantum yield of the donor inthe absence of the acceptor; n=refractive index; and, J(S)=spectraloverlap integral=IM_(A)(S).F_(D)S.S⁴dScm³M⁻¹, Where M_(A)=extinctioncoefficient of acceptor and F_(D)=Fluorescence emission intensity ofdonor as a fraction of total integrated intensity. Typicaldonor-acceptor pairs include fluorescein/Cy5,fluorescein/tetramethylrhodamine, IAEDANS/fluorescein,Fluorescein/Fluorescein, BODIPY/BODIPY and EDANS/DABCYL. An extensivecompilation of R_(o) values are found in the literature; see, Haugland(1996) Handbook of Fluorescent Probes and Research/Chemicals Publishedby Molecular Probes, Inc., Eugene, Oreg. at page 46 and the referencescited therein.

[0355] In brief, two probes are labeled with different fluorophores. Thetwo probes are complementary to a specific region of a gene to beanalyzed. If the desired genotype (recombination event) is present inthe sample, the probes bring two fluorophores into close proximity(e.g., within R_(O)), allowing a transfer of energy between them. Thistransfer of energy can be monitored using a device such as the onedescribed in the De Silva et al. references (id); see also, theLightCycler from Amersham.

[0356] This approach can be used in shuffling or other diversitygenerating reactions using automated techniques. In order to label theDNA molecules, constructed, e.g., during PCR or LCR reactions,nucleotides labeled with fluorophores are used and are introduced by theDNA polymerase or other enzymes into the molecule, or via automatedsynthetic approaches. The fluorophores are excited and detected bysystem.

[0357] For example, two genes to be shuffled can be labeled using thismethod, e.g., one with fluorescein, and the other with Cy5 in a PCRreaction (both fluorophores are available, e.g., from AmershamPharmacia). The labeled genes are fragmented, e.g., using DNaseI beforebeing shuffled by the system. Recombination between the two genes bringsthe fluorescein molecule next to the Cy5 molecule, and, e.g., after eachcycle the system excites the fluorescein. The fluorescein then transfersits energy either to the Cy5 molecule, if it is proximal, or to themedia if it is not. The system then detects light at the wavelength ofemission of Cy5, providing an indication of FRET. Similarly, FRET can beused to assess recombination frequency by solution-phase or solid-phasehybridization to differentially labeled fluorescence-coupledoligonucleotide, PCR amplified or restriction fragment-generated probes.

[0358] (8.) Non-Coding Control Sequences

[0359] Quite commonly, output nucleic acids from the shuffling ormutagenesis module comprise one or more sequences which controltranscription or translation or which facilitate downstream processingof the nucleic acid (e.g., cloning). These sequences include promoters,enhancers, ribosome binding sites, translation initiation regions,transcription initiation regions, universal PCR primer binding sites,sequencing primer binding sites, restriction enzyme digestion sequencesand other sequences of known activity. Ausubel, Sambrook, Berger and anumber of other references herein provide an introduction to sequencesuseful in genetic engineering. Many such sequences are known and caneasily be provided in the present methods, if desired. For example,including such sequences as part of PCR or ligase-directed genesynthesis is a convenient way of incorporating such sequences ofinterest.

[0360] Amplifying recombinant nucleic acids in physical or logicalarrays, or amplifying elongated nucleic acids in master arrays,duplicate arrays or other arrays herein can include, as a feature of theamplification, the incorporation of one or more transcription ortranslation control subsequence into the elongated nucleic acids,recombinant nucleic acids in the physical or logical array, intermediatenucleic acids produced using elongated nucleic acids or recombinantnucleic acids in the physical or logical array as a template, partial orcomplete copies of elongated nucleic acids or recombinant nucleic acidsin the physical or logical arrays, and the like. One or moretranscription or translation control subsequence can be ligated to theelongated nucleic acids, the recombinant nucleic acids in the physicalor logical array, intermediate nucleic acids produced using theelongated nucleic acids or the recombinant nucleic acids in the physicalor logical array as a template, partial or complete copies of theelongated nucleic acids or the recombinant nucleic acids in the physicalor logical array, etc. For example, the one or more transcription ortranslation control subsequences can be hybridized or partiallyhybridized to the above nucleic acids during any nucleic acidamplification or polymerase or ligase mediated method herein.

[0361] (9.) Isolation of Single DNA Molecules from a Mixed Pool withoutBacterial Transformation

[0362] This section describes a method that allows pieces of DNA to besingly isolated from a pool and amplified for sequencing or otherprocess (e.g., shuffling or in vitro translation) without the use of ahost organism. The method is both faster and more reliable thantraditional cloning. The method is based upon the ability to formparticles from individual pieces of DNA that can then be isolated anddispensed into individual wells. The particles are degraded and eachpiece of DNA is amplified to give enough material for sequencing orother downstream operations.

[0363] The advantage of this protocol is that the particles are formeddue to the physical nature of the DNA polymer, and as such, the protocolis sequence and context independent. Thus all pieces of DNA haveapproximately the same chance of being amplified at the end of theprocess, unlike traditional cloning methods.

[0364] DNA Library Preparation

[0365] When cloning from genomic DNA, the DNA is usually cleaved tosuitable size by nuclease (e.g., restriction enzyme) or mechanicaltreatment. To amplify the DNA, the ends of each fragment are compatible,e.g., for PCR amplification using standard primers. This is true if theDNA molecules have a standard construction with fixed 5′ and 3′ ends (asis usual for RNA or DNA selection constructs and for expressionconstructs). For cloning of fragments of unknown DNA (or followingmechanical or random cleavage procedures), this is achieved by ligationof standard primers to the end of each fragment for subsequent ligationinto a vector. Fluorescent or other tags can be added to the extensionto aid handling and analysis. Successfully ligated molecules can beenriched in the pool by PCR and purified, if necessary, by standardmethods.

[0366] Monomolecular Particle Formation

[0367] DNA is a rigid polyanionic linear polymer that exists as amonomer in solution with a large radius of gyration as it floats in arandom coil structure. The addition of a polycationic polymer to asolution of DNA causes the DNA to associate with the polycation andcondense in a cooperative electrostatic process to yield a compactcomplex. Due to the electrostatic nature of the process, there is atendency for multiple copies of the two polymers to associate to givelarge poorly defined mixtures of particles.

[0368] Complexation of DNA with single chain cationic detergents isknown to form small monomolecular particles (J. Am. Chem. Soc. 1995,117, 2401-2408), but these complexes are unstable to reduction of thedetergent concentration. The ability of single chain detergents to formcomplex is based upon the formation of the polycation at the DNA in atemplate-assisted assembly. Hence addition of such a detergent to asolution of DNA leads to formation of small (˜20 nm) complexes which canthen be dispensed into individual wells. Dilution of the particles witha PCR mix leads to dissolution of the complex, releasing free DNA readyfor amplification.

[0369] Complexes formed with detergent can be relatively unstable.However, other methods of forming monomolecular complexes are available.See, e.g., Blessing (1998) Proc. Natl. Acad. Sci. USA 95:1427-1431. Inthis protocol, the single chain cationic detergent contains a chemicalmoiety such as a thiol group. Once the complex has formed, thedetergents are dimerized (by oxidation for thiols) which yields a stableparticle. Once the particles are dispensed, the dimerization is reversed(reduction of the disulfide) and the complex degrades to yield free DNA.Addition of lipophilic fluorophores to these complexes leads toproduction of a fluorescent particle. This can be used to track thecomplexes for sorting as described below.

[0370] Dispensing the Particles

[0371] The charged complexes formed by the protocols outlined above arereadily sorted by electrophoretic mobility to remove uncomplexedmaterial. Dispensing these particles into separate wells of a microtiterplate uses, e.g., electrophoresis, e.g., in which the particles traveldown a capillary (or channel) in single file, much like in a FACSmachine (or chip). A fluorescent detector (e.g., LIF, confocal laserwith suitable PMT/CCD) set up at the end of the system detects passageof particles and directs particles into the receiving well. Flowcytometry systems which will sort into microtiter plates of any format,are available, e.g., from Cytomation (http://www.cytomation.com/; FortCollins, Colo.).

[0372] Release of the Free DNA

[0373] Stability of the DNA-detergent complex is sensitive to reductionin detergent concentration. Thus, dilution of the particles into a PCRmix leads to dissolution of the complex, releasing free DNA foramplification. The PCR product can then be used for the desired purpose(sequencing, in vitro transcription/translation, etc.).

[0374] (10.) Array Copy Systems

[0375] During operation of the devices of the invention, populations ofnucleic acids can be arranged into one or more physical or logicalrecombinant nucleic acid arrays. In several of the procedures herein, aduplicate of at least one of the one or more physical or logicalrecombinant nucleic acid arrays is produced in the process ofamplifying, sequencing, or expressing members of the nucleic acid array.Thus, in one typical embodiment, the system includes a shuffled nucleicacid master array which physically or logically corresponds to positionsof the shuffled nucleic acids in the reaction mixture array. This masterarray can be accessed as necessary, e.g., where access of reactionmixture or other duplicated nucleic acid arrays is not feasible. Seealso, FIG. 1b.

[0376] In general, the diversity generation module can copy arrays(i.e., the module can include an array copy function) to produceduplicate arrays, master arrays, amplified arrays and the like, e.g.,where any operation is contemplated which could make recovery of nucleicacids from an original array problematic (e.g. where a process to beperformed destroys the original nucleic acids, e.g., recombinationmethods that change the nature of product nucleic acids as compared tostarting nucleic acids), or where an elevated stability for the arraywould be helpful (e.g., where an amplified array can be produced tostabilize accessible copies of nucleic acids), or where a normalizationof components (e.g., to provide similar concentrations of reactants orproducts) is useful for recombination, expression or analysis purposes.Copies can be made from master arrays, reaction mixture arrays or anyduplicates thereof.

[0377] For example, the diversity generation module optionally dispensesnucleic acids into one or more master multiwell plates and, typically,amplifies the resulting master array of elongated nucleic acids (e.g.,by PCR) to produce an amplified array of elongated nucleic acids. Theshuffling module can include an array copy system which transfersaliquots from the wells of the one or more master multiwell plates toone or more copy multiwell plates.

[0378] The array of reaction mixtures can be formed, e.g., by separateor simultaneous addition of an in vitro transcription reagent and an invitro translation reagent to one or more copy multiwell plates (or otherspatially organizing set of containers), or to a duplicate set thereof,to diversified nucleic acids.

[0379] In addition to adding reaction mixture components directly toarrays, reaction mixture components are commonly added to duplicatearrays of shuffled or otherwise diversified nucleic acids. For example,the reaction mixtures can be produced by adding in vitrotranscription/translation reactants~to a duplicate nucleic acid array,which is duplicated from a master array of the shuffled nucleic acidsproduced by spatially or logically separating members of a population ofthe shuffled nucleic acids.

[0380] Arraying techniques for producing both master and duplicatearrays from populations of shuffled or otherwise diversified nucleicacids can involve any of a variety of methods. For example, when formingsolid phase arrays (e.g., as a copy of a liquid phase array, or as anoriginal array), members of the population can by lyophilized or bakedon a solid surface to form a solid phase array, or chemically coupled orprinted (e.g., using ink-jet printing methods) to the solid surface.Similarly, population members can be converted from solid phase toliquid phase by rehydrating members of the population, or by cleavingchemically coupled members of the population of shuffled nucleic acidsfrom the solid surface to form a liquid phase array. One or morephysically separated logical or physical array members can be accessedfrom one or more sources of shuffled or otherwise diversified nucleicacids and moved to one or more array destination site (e.g., bypipetting into microtiter trays), where the one or more destinationsconstitute a logical array of the shuffled nucleic acids.

[0381] Individual members of an array can be copied in a number of ways.For example, members can be amplified and aliquots removed and placed ina duplicate array. Alternately, where the sequences of array members aredeconvoluted (e.g., sequenced) copies can be produced synthetically andplaced into copy arrays. Two preferred ways of copying array members areto use a polymerase (e.g., in amplification or transcription formats) orto use an in vitro nucleic acid synthesizer for copying operations.Typically, a fluid handling system will deposit copied array members indestination locations, although non-fluid based member transport (e.g.,transfer in a solid or gaseous phase) can also be performed.

[0382] B. In vitro Transcription/Translation

[0383] In one preferred embodiment of the invention, libraries ofnucleic acids produced by the various diversity generation methods setforth herein (shuffling, mutation, etc.) are transcribed (i.e., wherethe diverse nucleic acids are DNAs) into RNA and translated intoproteins, which are screened by any appropriate assay. Common in vitrotranscription and/or translation reagents include reticulocyte lysates(e.g., rabbit reticulocyte lysates) wheat germ in vitro translation(IVT) mixtures, E coli lysates, canine microsome systems, HeLa nuclearextracts, the “in vitro transcription component,” (see, e.g., Promegatechnical bulletin 123), SP6 polymerase, T3 polymerase, T7 RNApolymerase (e.g., Promega # TM045), the “coupled in vitrotranscription/translation system” (Progen Single Tube Protein System 3)and many others. Many of translation systems are described, e.g., inAusubel, supra. as well as in the references below, and manytranscription/translation systems are commercially available.

[0384] Methods of processing (transcribing and/or translating)diversified nucleic acids (shuffled, mutagenized, etc.) are provided. Inthe methods, a physical or logical array of reaction mixtures isprovided, in which a plurality of the reaction mixtures include one ormore member of a first population of nucleic acids (including shuffled,mutagenized or otherwise diversified nucleic acids). A plurality of theplurality of reaction mixtures further comprise an in vitrotranscription or translation reactant. One or more in vitro translationproducts produced by a plurality of members of the physical or logicalarray of reaction mixtures is then detected. The physical or logicalarray or reaction mixtures produced by these methods are also a featureof the invention.

[0385] Generally, cell-free transcription/translation systems can beemployed to produce polypeptides from solid or liquid phase arrays ofDNAs or RNAs as provided by the present invention. Severaltranscription/translation systems are commercially available and can beadapted to the present invention by the appropriate addition oftranscription and or translation reagents to arrays of diversifiednucleic acids, e.g., produced by shuffling target nucleic acids andarraying the resulting nucleic acids. A general guide to in vitrotranscription and translation protocols is found in Tymms (1995) Invitro Transcription and Translation Protocols: Methods in MolecularBiology Volume 37, Garland Publishing, NY. Any of the reagents used inthese systems can be flowed or otherwise directed into contact withnucleic acid array members.

[0386] Typically, in the present invention, in vitro transcriptionand/or translation reagents are added to an array (or duplicate thereof)that embodies the diverse populations of nucleic acids generated bydiversity generating procedures. For example, where the nucleic acids ofinterest are plated on microtiter trays, the in vitrotranscription/translation reagents are added to the wells of the traysto form arrays of reaction mixtures that individually comprise the invitro transcription/translation reagents, the nucleic acids of interestand any other reagents of interest.

[0387] Several in vitro transcription and translation systems are wellknown and described in Tymms (1995), id. For example, an untreatedreticulocyte lysate is commonly isolated from rabbits after treatment ofthe rabbits with acetylphenylhydrazine as a cell-free in vitrotranslation system. Similarly, coupled transcription/translation systemsoften utilize an E. coli S30 extract. See also, the Ambion 1999 ProductCatalogue from Ambion, Inc (Austin Tex.).

[0388] A variety of commercially available in vitro transcription andtranslation reagents are commercially available, including thePROTEINscript-PRO™ kit (for coupled transcription/translation) the wheatgerm IVT kit, the untreated reticulocyte lysate kit (each from Ambion,Inc (Austin Tex.)), the HeLa Nuclear Extract in vitro Transcriptionsystem, the TnT Quick coupled Transcription/translation systems (bothfrom Promega, see, e.g., Technical bulletin No. 123 and Technical ManualNo. 045), and the single tube protein system 3 from Progen. Each ofthese available systems (as well as many other available systems) havecertain advantages which are detailed by the product manufacturer.

[0389] In addition, the art provides considerable detail regarding therelative activities of different in vitro transcription translationsystems, for example as set forth in Tymms, id.; Jermutus et al. (1999)“Comparison of E. Coli and rabbit reticulocyte ribosome display systems”FEBS Lett. 450(1-2):105-10 and the references therein; Jermutus et al.(1998) “Recent advances in producing and selecting functional proteinsby using cell-free translation” Curr. Opin. Biotechnol. 9(5):534-48 andthe references therein; Hanes et al. (1988) “Ribosome DisplayEfficiently Selects and Evolves High-Affinity Antibodies in vitro fromImmune Libraries” PNAS 95:14130-14135 and the references therein; andHanes and Pluckthun (1997) “In vitro Selection and Evolution ofFunctional Proteins by Using Ribosome Display.” Biochemistry94:4937-4942 and the references therein.

[0390] For example, an untreated rabbit reticulocyte lysate is suitablefor initiation and translation assays where the prior removal ofendogenous globin mRNA is not necessary. The untreated lysate translatesexogenous mRNA, but also competes with endogenous mRNA for limitingtranslational machinery.

[0391] Similarly, The PROTEINscript-PRO™ kit from Ambion is designed forcoupled in vitro transcription and translation using an E. coli S30extract. In contrast to eukaryotic systems, where the transcription andtranslation processes are separated in time and space, prokaryoticsystems are coupled, as both processes occur simultaneously. Duringtranscription, the nascent 5′-end of the mRNA becomes available forribosome binding, allowing transcription and translation to proceed atthe same time. This early binding of ribosomes to the mRNA maintainstranscript stability and promotes efficient translation. Coupledtranscription: translation using the PROTEINscript-PRO Kit is based onthis E. coli model.

[0392] The Wheat Germ IVT™ Kit from Ambion, or other similar systems,is/are a convenient alternative, e.g., when the use of a rabbitreticulocyte lysate is not appropriate for in vitro protein synthesis.The Wheat Germ IVT™ Kit can be used, e.g., when the desired translationproduct comigrates with globin (approx. 12-15 kDa), when translatingmRNAs coding for regulatory factors (such as transcription factors orDNA binding proteins) which may already be present at high levels inmammalian reticulocytes, but not plant extracts, or when an mRNA willnot translate for unknown reasons and a second translation system is tobe tested.

[0393] The T_(N)T® Quick Coupled Transcription/Translation Systems(Promega) are single-tube, coupled transcription/translation reactionsfor eukaryotic in vitro translation. The T_(N)T® Quick CoupledTranscription/Translation System combines RNA Polymerase, nucleotides,salts and Recombinant RNasin® Ribonuclease Inhibitor with thereticulocyte lysate to form a single TNT® Quick Master Mix. The TNT®Quick Coupled Transcription/Translation System is available in twoconfigurations for transcription and translation of genes cloneddownstream from either the T7 or SP6 RNA polymerase promoters. Includedwith the TNT® Quick System is a luciferase-encoding control plasmid andLuciferase Assay Reagent, which can be used in a non-radioactive assayfor rapid (<30 seconds) detection of functionally active luciferaseprotein.

[0394] In addition to coupled in vitro trancription and translation,either step may be done separately from the other by in vitro orcellular means. For example, in vitro transcribed RNA can be provided tocells for subsequent translation by way of mechanical or osmoticmicroinjection., methods for which are well known in the art. Moreover,cells containing RNA derived by transcription from one or more of theshuffling and mutagenesis methods described (directly or indirectly)herein can be lysed and the RNA obtained for subsequent analysis. Thepurified or unpurified RNA obtained in this manner can be subjected toin vitro or in situ translation. All such methods can be conductedwithin or in conjunction with the various arraying approaches describedin this invention.

[0395] Many other systems are well known, well characterized and setforth in the references noted herein, as well as in other referencesknown to one of skill. It will also be appreciated that one of skill canproduce transcription/translation systems similar to those which arecommercially available from available materials, e.g., as taught in thereferences noted above.

[0396] The methods of the invention can include in-line or off-linepurification of one or more reaction product array members. In linepurification is performed as part of the transfer process from an invitro transcription/translation reaction to a product detection oridentification module, whereas off-line purification can be performedbefore or after transfer, or in a parallel module.

[0397] In any case, once expressed, proteins can be purified, eitherpartially or substantially to homogeneity, according to standardprocedures known to and used by those of skill in the art. Polypeptidesof the invention can be recovered and purified from arrays by any of anumber of methods well known in the art, including ammonium sulfate orethanol precipitation, acid or base extraction, column chromatography,affinity column chromatography, anion or cation exchange chromatography,phosphocellulose chromatography, hydrophobic interaction chromatography,hydroxylapatite chromatography, lectin chromatography, gelelectrophoresis and the like. Protein refolding steps can be used, asdesired, in completing configuration of mature proteins. Highperformance liquid chromatography (HPLC) can be employed in finalpurification steps where high purity is desired. Once purified,partially or to homogeneity, as desired, the polypeptides may be used(e.g., as assay components, therapeutic reagents or as immunogens forantibody production).

[0398] In addition to the references noted supra, a variety ofpurification/protein folding methods are well known in the art,including, e.g., those set forth in R. Scopes, Protein Purification,Springer-Verlag, N.Y. (1982); Deutscher, Methods in Enzymology Vol. 182:Guide to Protein Purification, Academic Press, Inc. N.Y. (1990); Sandana(1997) Bioseparation of Proteins, Academic Press, Inc.; Bollag et al.(1996) Protein Methods, 2nd Edition Wiley-Liss, NY; Walker (1996) TheProtein Protocols Handbook Humana Press, NJ, Harris and Angal (1990)Protein Purification Applications: A Practical Approach IRL Press atOxford, Oxford, England; Harris and Angal Protein Purification Methods:A Practical Approach IRL Press at Oxford, Oxford, England; Scopes (1993)Protein Purification: Principles and Practice 3rd Edition SpringerVerlag, NY; Janson and Ryden (1998) Protein Purification: Principles,High Resolution Methods and Applications, Second Edition Wiley-VCH, NY;and Walker (1998) Protein Protocols on CD-ROM Humana Press, NJ; and thereferences cited therein. Additional details regarding protein foldingand other in vitro protein biosynthetic methods are found in Marszal etal. U.S. Pat. No. 6,033,868 (Mar. 7, 2000).

[0399] As noted, those of skill in the art will recognize that aftersynthesis, expression and/or purification, proteins can possess aconformation substantially different from the native conformations ofthe relevant parental polypeptides. For example, polypeptides producedby prokaryotic systems often are optimized by exposure to chaotropicagents to achieve proper folding. During purification from, e.g.,lysates derived from E. coli, the expressed protein is optionallydenatured and then renatured. This is accomplished, e.g., bysolubilizing the proteins in a chaotropic agent such as guanidine HCl.

[0400] In general, it is occasionally desirable to denature and reduceexpressed polypeptides and then to cause the polypeptides to re-foldinto the preferred conformation. For example, guanidine, guanidinium,urea, detergents, chelating agents, DTT, DTE, and/or a chaperonin can beadded incubated with a transcription product of interest. Methods ofreducing, denaturing and renaturing proteins are well known to those ofskill in the art (see, the references above, and Debinski, et al. (1993)J. Biol. Chem., 268: 14065-14070; Kreitman and Pastan (1993) Bioconjug.Chem.,4: 581-585; and Buchner, et al., (1992) Anal. Biochem., 205:263-270). Debinski, et al., for example, describe the denaturation andreduction of inclusion body proteins in guanidine-DTE. The proteins canbe refolded in a redox buffer containing, e.g., oxidized glutathione andL-arginine. Refolding reagents can be flowed or otherwise moved intocontact with the one or more polypeptide or other expression product, orvice-versa.

[0401] Various systems are also available for simultaneous synthesis andfolding of complex proteins. For example, the control of redoxpotential, the use of helper proteins (from both bacterial andeukaryotic systems) and the like can be used to provide for improvedcell free translation. Optionally, proteins may be added which aid inprotein refolding, such as by maintaining solubility of the nascent orpartially folded protein (e.g. chaperonins) or by adjusting theconfiguration of inter- and intra-molecular disulfide, bonds (e.g.protein disulfide isomerase). In addition to the references noted above,additional details regarding cell free protein translation can be foundat http://chemeng.stanford.edu/html/swartz.htm.

[0402] RNA or protein or other products of a translation reaction can betagged with any available tag (biotin, His tag, etc.), and captured toan array position following expression, if desired. The products arereleased, e.g., by cleavage of an incorporated cleavage site, or otherreleasing methods (salt, heat, acid, base, light, or the like). Inalternate embodiments, products are free in solution or encapsulated inmini-reaction compartments such as inverted micelles, liposomes, or gelparticles or droplets.

[0403] As noted, it can be desirable to reconstitute expression productsin liposomes, inverted micelles, or other lipid systems. Thus, thesystem can include a source of one or more lipid. Typically this lipidis flowed into contact with the one or more polypeptide or otherreaction product (or vice-versa), or into contact with the physical orlogical array of reaction mixtures. Similarly, the lipid can be flowedinto contact with one or more shuffled or mutagenized nucleic acids (ortranscription products thereof), thereby producing one or more liposomesor micelles comprising the polypeptide or other reaction product,reaction mixture components, and/or nucleic acids.

[0404] Liposomes and related structures are particularly attractivesystems for use in the present invention, because they serve toconcentrate reagents of interest into small volumes and because they areamenable to FACS and other high-throughput methods. In addition tostandard FACS methods, microfabricated FACSs for use in sorting cellsand certain subcellular components such as molecules of DNA have alsobeen described in, e.g., Fu, A. Y. et al. (1999) “A MicrofabricatedFluorescence-Activated Cell Sorter,” Nat. Biotechnol. 17:1109-1111;Unger, M., et al. (1999) “Single Molecule Fluorescence Observed withMercury Lamp Illumination,” Biotechniques 27:1008-1013; and Chou, H. P.et al. (1999) “A Microfabricated Device for Sizing and Sorting DNAMolecules,” Proc. Nat'l. Acad. Sci. 96:11-13. These sorting techniquesutilizing microfabricated FACSs generally involve focusing cells usingmicrochannel geometry and can be adapted to the present invention by theinclusion of a chip-based FACS system in the in vitrotranscription/translation module of the system.

[0405] The following example provides details regarding use of liposomesas reaction vesicles.

[0406] (1.) Alternate Format: In vitro Clone Selection: Direct Isolationof Active Sequences from a DNA Library—Use of Liposomes in theIntegrated Systems of the Invention

[0407] The slowest step in the manipulation of DNA is often theselection of functional DNA constructs in vivo. That is, DNA is oftenmaintained in a form suitable for transformation and growth in a hostorganism, such as E. coli, to allow the selection of positive constructsfrom the background. This example describes functional assays to beperformed on the gene product, which is transcribed directly from a DNAlibrary, leading to the isolation of the specific construct bearing thedesired activity. The technique is amenable to the screening oflibraries of any size.

[0408] This example relies upon the application of a number oftechniques in series. In particular, the example uses liposomes asreaction/sorting compartments, in vitro transcription/translation, afluorescent activity assay and a FACS machine.

[0409] The use of in vitro transcription/translation systems to producesmall amounts of protein from DNA in solution is described above. Theencapsulation of this machinery inside a small compartment (−1 μm), suchas an inverted micelle (Tawfik and Griffiths (1998) Nature Biotech,16:652-656) or liposome, enables the machinery to act upon a single DNAmolecule. The presence of 1 molecule in a 1 μm diameter spherecorresponds to a concentration of ^(˜)2.5 nM. Thus, the effectiveconcentration of the DNA is sufficient for efficienttranscription/translation and even a single round of translation gives auseful protein concentration. A single turnover of the enzyme encoded bythe DNA also gives nM concentrations of product; therefore, e.g., about100 catalytic events are sufficient for detection. Detection of thisfluorescence by the laser of the FACS machine will then lead to thesorting of the fluorescent compartments (liposomes only, as invertedmicelles are incompatible with the FACS machine). In general, FACSmachines sort liposomes, cells or other sortable compartments at a rateof thousands per second, which allows millions of liposomal reactioncompartments to be sorted routinely. The selected liposomes can then bedegraded and the formerly encapsulated DNA isolated and purified. TheDNA that encoded a gene product(s) capable of generating fluorescenceunder the assay conditions are substantially present in this sample.This DNA is further analyzed or used directly in another cycle of thisprocess under more stringent conditions.

[0410] For example, Tawfik and Griffiths, id, describes a system inwhich linear DNA encoding a DNA methylase was isolated from a backgroundof other DNA. The DNA was encapsulated in inverted micelles withsuitable transcription/translation machinery, such that only one DNAmolecule was encapsulated in each micelle. After the DNA methylase hadbeen translated, it methylated the DNA accessible to it, i.e. present inthat micelle. The reaction was quenched and the DNA was isolated fromthe micelles. The pooled DNA was then exposed to the restriction enzymecorresponding to the methylase, leading to the degradation ofunmethylated sequences. The intact DNA was then amplified by PCR and theDNA was found to be highly enriched in the methylase encoding sequence.

[0411] A solution of the in vitro transcription/translation machinerywith the substrates required for the activity assay is provided, atconcentrations sufficient to ensure that each liposome contains a selfsufficient transcription/translation/gene product assay system, in asuitable buffer, at 4° C. A DNA library is added at a concentration suchthat generally only about one or zero DNA molecule(s) are present ineach liposome.

[0412] The liposomes are formed using a solvent dispersal method, whichallows the direct formation of small unilamellar vesicles of definedsize in the starting solution. The starting solution is stirred at apredetermined speed and the lipids are added to the solution in a watermiscible solvent. As the solvent disperses (solvent is typically lessthan 2% final concentration) the lipids are exposed to the aqueous phasewhich causes them to spontaneously form SUVs of a size defined by theconditions and the choice of lipid mixture. In a typical experiment 30%of the initial solution will be encapsulated in liposomes. The liposomesare purified and the remaining unencapsulated solution can be recycledif desired. The liposomes are then incubated under conditions that favortranscription/translation and later conditions suitable for the activityassay of interest.

[0413] The stability of the liposomes and their behavior in solution canbe controlled by the choice of the constituent lipids, which form thebilayer. Thus, the compartment for reaction can be tailored to fit theconditions necessary for a specific experiment. Fluorescent lipids canalso be incorporated into the bilayer, which can be used as an internalstandard for fluorescence produced in the gene product assay e.g., inthe FACS machine.

[0414] Gene product can be assayed using any of the standard fluorescentformats, such as the production/consumption of a fluorophore in thereaction, fluorescence resonance energy transfer (FRET), or coupledassays that use the product of the reaction performed by the geneproduct as the substrate for another reaction which generates afluorophore. The tiny volume of reaction (−4 femtolitres for a ^(˜)1 μmdiameter vesicle) increases the sensitivity of the solution to changesin the number of ions such as H⁺ (i.e. pH) and Ca²⁺ for which specificfluorescent detection methods are available. Fluorescent methods are themost commonly used assays for most enzyme classes, which providesgeneral utility for this system.

[0415] Once sufficient time has been allowed for the gene product toperform its reaction, the liposome suspension is sorted using a FACSmachine. Particles of ^(˜)1 μm diameter are readily visualized/sorted ata rate of thousands per second by this technology. Thus, the liposomeswhich are sufficiently fluorescent (and thus contain an active geneproduct/DNA construct) are separated from the many which do not meet thepredefined criteria. The DNA is then purified from the sorted liposomepopulation using standard methodology.

[0416] This approach confers a number of advantages over traditionalcloning protocols. Firstly, the entire screening process is, performedin a single batch, limiting the amount of liquid handling steps, so thatthere is virtually no limit to the size of library that can be screenedin a single run. The only time the individual DNA constructs are handledindividually is when they are sorted in the FACS machine, allowingextremely high throughput screens to be performed. Further, any geneproduct are handled equally efficiently, with no problems associatedwith host organism toxicity, protease mediated degradation, or the like.Even membrane associated proteins are screenable, due to the lipidbilayer nature of the liposomes.

[0417] Equally powerful particle screening methods are available by useof quantitative (e.g. digital) imaging in association with visible orfluorescent microscopy. In such methods, a library of particlesproducing a quantifiable emission are distributed on a surface in such away as to maintain a reasonably fixed positions. Visualization andquantification of emission of light from particle(s) or specifiedsub-area(s) (as in a grid) is conducted by one of a variety of availableof microscopic devices operatively linked to and digital imaging camera.Optionally, these components may be linked to a computer or otherhigh-speed computational device equipped with software capable ofcorrecting for lens curvature, unequal background within the field ofview, and the like. Such imaging hardware and software can be used toguide (manually or electronically) the selective ‘picking’ or removal ofparticles from the surface. Such particles are then processed,characterized and arrayed as described elsewhere within this disclosure.Particularly useful for the selective ‘picking’ of particles from asurface are micromanipulation tools such as capillary-actuated clampingdevices such as find use in ion channel and patch clamp studies, opticaland atomic tweezers, micropipets, syringes, and the like.

[0418] Furthermore, because the only components in the system are addedby design, there is no interference from overlapping activities of otherproteins, etc., leading to a low background and the ability to detectvery low levels of activity. Similarly, because no living organism isinvolved in the process, sensitive or dangerous gene products such asantibiotic resistance genes and factors which mediate infection can bestudied without risk of transferring the new activity to pathogens and,therefore, the safety concerns for the systems are relatively reduced.Finally, results of an experiment can be produced quickly withoutwaiting for an incubation period, especially when the host organism is aslow growing yeast or mold.

[0419] In addition to liposomes, individual or pooled nucleic acidpopulations with relevant in vitro transcription or translation reagentsmay be encapsulated within agar, agarose, carageenan, guar and relatedbiological gels and gums; or in a wide variety of hygroscopic syntheticpolymers such as polyacrylates, polymethylmethacrylates,polyacrylamides, polyethyleneimine (crosslinked) membranes, or the like.Methods for using these substances to encapsulate biological materialsare known in the art. For example, microdroplets are formed by flowing amixture of the polymerizing or pre-gelled polymer with a mixturecontaining the biochemical components of interest. Microdroplettechnology is described, e.g., in Weaver et al. (1993) “Microdroptechnology: A General Method for Separating Cells by Function andComposition” METHODS: A Companion to Methods in Enzymology 2(3)234-247).

[0420] The resulting mixture is passed through a mechanical oraspirating device capable of atomizing the stream into microdroplets ofdesired size or characteristics. Such microdroplets can be sprayed ontoa surface, plate, preformed grid, or the like, directly from theatomizing device, or passed into a separate aspirator, nozzle or inkjet-like device. Commonly, the particles can be sprayed in a random orsemi-random manner onto the target surface and allowed to retain arelatively fixed position either by surface tension, gel adhesion ormaintenance of a low moisture or low-eddy current capillary layer on agel or moist surface. The positions of the quantified particles may beused to establish and record an initial array or the particles ofinterest may be picked and repositioned in a more normal pattern toestablish the functional array.

[0421] This embodiment facilitates the process of developing biologicalcatalysts for novel functions by giving a direct connection between DNAstructure and gene product activity and by decreasing the time requiredfor the interactive evolution of novel activities.

[0422] (2.) Alternate Format: Localizing In VitroTranscription/Translation Products

[0423] Methods of detecting or enriching for in vitro transcription ortranslation products are provided. In the methods, one or more firstnucleic acids (e.g., shuffled or otherwise diversified nucleic acids)which encode one or more moieties are localized proximal to one or moremoiety recognition agents which specifically bind the one or moremoieties. The one or more nucleic acids are in vitro translated ortranscribed, producing the one or more moieties (e.g., polypeptides orbiologically active RNAs such as anti-sense or ribozyme molecules, orother product molecules). The one or more moieties diffuse or flow intocontact with the one or more moiety recognition agents (e.g.,antibodies, antigens, etc.). Binding of the one or more moieties to theone or more moiety recognition agents is permitted and the one or moremoieties are detected or enriched for by detecting or collecting one ormore materials proximal to, within or contiguous with the moietyrecognition agent (the material comprises at least one of the one ormore moieties, where the moieties comprise one or more in vitrotranslation or transcription product). Optionally, the one or moremoieties are pooled by pooling the material which is collected. Hereagain, a variety of variants of this basic class of methods are setforth herein as are a variety of products produced by the methods andtheir variants. The one or more moieties can be pooled by pooling thematerial which is collected.

[0424] For example, the first nucleic acids can include a relatedpopulation of shuffled nucleic acids which encode an epitope tag, whichis bound by the moiety or one or more moiety recognition agents. Thefirst nucleic acids can include transcription or translation controlsequences, such as an inducible or constitutive heterologous (ornon-heterologous) promoter. In some embodiments, the first nucleic acidsinclude a related population of shuffled nucleic acids and a PCR primerbinding region, the method further including PCR amplifying a set ofparental nucleic acids to produce the related population of shufflednucleic acids.

[0425] Optionally, the first nucleic acids can include a relatedpopulation of shuffled nucleic acids and a PCR primer binding region. Inthis case, the method can include identifying one or more target firstnucleic acid by proximity to the moieties which are bound to the one ormore moiety recognition agent, and amplifying the target first nucleicacid by hybridizing a PCR primer to the PCR primer binding region andextending the primer with a polymerase.

[0426] The first nucleic acids and the one or more moiety recognitionagents can be localized on a solid substrate (including membranes, beadsand other substrates commonly available), or in a gel or other matrixthat limits diffusion of the moiety recognition agents or the nucleicacids. The first nucleic acids and the one or more moiety recognitionagents can be localized on the solid substrate by a cleavable linker, achemical linker, a gel, a colloid, a magnetic field, an electricalfield, a combination thereof, or the like. In one aspect, the moiety ormoiety in contact with the moiety recognition agent can release thenucleic acid, e.g., where the moiety recognition agent cleaves acleavable linker which attaches the first nucleic acid to a solidsubstrate.

[0427] Typically, the invention can include detecting an activity of themoiety or moiety recognition agent. The one or more first nucleic acidcan then be picked with an automated robot, providing for recovery ofthe nucleic acid and further processing. For example, the one or morefirst nucleic acid can be picked by placing a capillary on a regioncomprising the detected activity of the moiety or moiety recognitionagent and withdrawing the capillary.

[0428] Example: Enrichment Method of In vitro Transcription/TranslationProducts

[0429]FIG. 17, Panels A-E schematically show an embodiment in whichproducts of in vitro transcription/translation (ivTT) are captured on asolid substrate or in a matrix for further analysis, e.g., viaimmobilized antibodies or other protein capture mechanisms. As shown,both in vitro transcription and translation products can be captured ona single substrate, providing a mechanism for direct identification andisolation of genes of interest on the substrate.

[0430] As shown, an oligonucleotide “hook” is used to capture shuffledor otherwise diversified genes (the hook can hybridize to a region thatis held constant in the shuffling or other diversification reaction) tothe substrate (which may be any of the substrates herein, includingbeads, membranes, slides, trays, etc.). Alternately, the oligo can binda universal epitope on a PCR primer of interest that is incorporatedinto the gene, e.g., a biotin or other molecule. The gene is in vitrotranscribed/translated, with the product being captured by anappropriate binding moiety (if the product is a protein, an antibody canbe used as the binding moiety; if the product is an RNA, a secondcapture nucleic acid can be used as the binding moiety). For example,the surface (e.g., plate/bead/well) can be coated with oligos,antibodies, or both. For oligo capture tags, the sequences optionallybind to generic sequence handles. The tags can include a variety offeatures, including primer sequences for PCR. The oligos can includefeatures for direct capture such as biotin or any other tag that can belinked to the oligo, e.g., through a chemical linkage, which optionallycan include a linker region. The oligos can be cleavable (e.g., throughincubation with a restriction enzyme). Similarly, cleavage itself can bea marker of activity, e.g., where activity of a restriction enzyme orvariant is the molecule to be tested. Similarly, the activity to betested can be a reporter system that results in cleavage of the capturetag. In the case of antibody tags, the tags can provide for uniformdisplay of active sites and can be used in a project independentfashion, e.g., in any system where the antibody ligand is present.

[0431] As shown, the product binds to the binding moiety in proximity tothe captured gene. Any activity of the product is then detected. Thecoding nucleic acid is isolated by its proximity to the detectedproduct, e.g., using a microcapillary or the like. For example, theproduct can produce a visible signal when active and the system candetect the signal (e.g., by signal region size, signal intensity, etc.)and select the corresponding region for isolation of the coding nucleicacid. In bead-based embodiments, nucleic acids can be selected by FACSor other fluorescence detection methods. The use of the hook to captureDNA offers many control point options, including, e.g., cleavage by avariant.

[0432] In one embodiment, which is shown in FIG. 17B, the product has anactivity which results in cleavage of proximal bound coding nucleicacids. However, depending on the nature of the substrate or matrix, anyavailable method can be used for cleavage of the coding nucleic acid,including chemical cleavage, light-directed cleavage, treatment with arestriction enzyme, or the like. The oligonucleotide hook can alsoinclude a cleavable linking element, as is common in the art.

[0433] As shown, genes are transcribed from a promoter such as a T7promoter, translated and the activity of the encoded variant enzymedetected. In the format depicted, the variant enzyme includes a captureregion that permits immobilization and detection. Free (e.g., soluble)genes transcribed in the same region are isolated. The process isrepeated until a desired enrichment is observed. The tether on the geneor the transcribed enzyme or the constant region of the enzyme variantcan be cleaved, e.g., specifically. Such specifically cleaved materialscan be specifically eluted or otherwise isolated from the system.Examples of such cleavable linkers include a cleavable substrate orsubstrate analog, e.g., for detection of an activity of the variantprotein (e.g., upon binding/cleavage by the protein variant, e.g., wherethe protein is an enzyme). Similarly, cleavage can be dependent onformation of a desired side product such as peroxide, heat, light,electricity or the like.

[0434] It is helpful to limit diffusion in this system, because, as thetranscription and/or translation product diffuses away from the tetheredcoding gene, the association between the tethered gene and the encodedproducts becomes more difficult to determine. Diffusion can be limitedby any available method, including allowing fortranscription/translation in a matrix that limits diffusion (e.g., a gelor polymer solution).

[0435]FIG. 17, panel C shows details of one embodiment using genericepitope tags. As shown, the tags provide for uniform display of thevarious active sites of the protein or other bio-molecule of interest.This provides for project independent use of the tags as well as for theuse of common reagents. Common tags such as His-tag IMAC can be used, ascan any fusion protein comprising a region to be used as tags. Thesystem also provides for common treatment such as free thiolintroduction and the like.

[0436] As shown in FIG. 17, panel D, a robotic system such as thecommercially available Q-bot can be used to pick positive regions of thesubstrate (e.g., to capture free genes prior to diffusion from a site ofinterest. Picking can be performed according to any standard hit pickingselection criteria, e.g., selection of a particular percentage ofvariants by the size/intensity produced by a product at a site ofactivity/expression. Alternately, a bead based protocol can be used inconjunction with FACS if a fluorescent product is formed. In eithercase, genes which are selected can be used as inputs for subsequentrounds of recombination or mutation (or both) and screening, or cansimply be used as product candidates. The products can also be furtherscreened, in pools or as single hits, using any appropriate assay.

[0437] As shown in FIG. 17, panel E, DNAs which are recovered aresubject to amplification reactions such as PCR or LCR and the amplifiedproducts subject to any additional diversity generation, isolation orselection step which is selected by the user or the system. As depicted,recovery in this example is performed via a microcapillary approach(e.g., using the Q-bot) and then subject to RT-PCR to produce productsthat, again can be used in subsequent recombination/mutation proceduresor for any of the other purposes noted herein. It is worth noting thatthe density of variant genes of interest is inversely proportional tothe enrichment of components in the system. Thus, to avoid bystandereffects, the density of variant genes should not be too high foraccurate selection by whatever selection mechanisms are used (capillary,FACS, etc.).

[0438] These methods can also be adapted to in vivo systems by lysingcells and capturing cell components. Systems for cell lysis and captureof nucleic acids such as Xpress-Screen™ from Tropix PE Biosystems(Bedford Mass.) can be adapted for use with this embodiment of theinvention.

[0439] C. High-Throughput Cloning and Expression

[0440] In addition to in vitro transcription/translation, highthroughput cloning and expression can be used to generate products toscreen for product activity. This approach has the advantage ofexpressing products in a system that is similar to the eventual intendedexpression site for many products (e.g., in cells).

[0441] Basic cloning methodology is set forth in Sambrook, Ausubel andBerger, supra. In the present high-throughput system, diversifiednucleic acids (e.g., a shuffled DNAs) are transformed into cells. Thecells are sorted (e.g., by FACS, micro-FACS, visual or fluorescencemicroscopy) by expression of a marker protein such as GFP, where themarker expression is encoded by a full-length copy of a correspondingnucleic acid, e.g., where the full-length nucleic acid also encodes afull-length product of interest. Cells that have been selected aretransferred to a micro-chamber or array where they express the shuffledgene. The micro-chamber or array contains a substrate for the shuffledprotein whose optical properties (i.e. absorbance or fluorescence) arechanged by catalysis by the enzyme. After a period of time, (e.g., ca.minutes to hours) the array of micro-chambers is “read” with a laser,CCD camera or other high density optical device. Those chambers in whichthe change in optical properties exceeds some threshold (i.e. a definingactivity) are emptied, one into each well of a high density microtitreplate (96, 384, 1500 well etc), and the cells are then grown for thesecond assay. This provides a high-throughput format as a pre-screen foractive clones.

[0442] Cells containing shuffled or mutated genes can express a proteinor pathway capable of providing a florescent signal directly. In such acase, the cell supplies the translation and, optionally, thetranscriptional machinery, and required substrates are loaded byincubating cells in a mixture appropriate for delivering the substratethrough the cell wall. Cells expressing either marker or library genesof interest are sorted and arrayed or collected on the basis of theemitted fluorescence signal. Such a signal may also derive from thescattering, or direct emission or absorbance of visible light from theindividual cells.

[0443] Several alternatives to traditional FACS devices exist andprovide particularly unique advantages to the present invention. Forexample, microfluidic systems (see, e.g., Fu A Y, Spence C, Scherer A,Arnold F H and Quake S R., (1999) “A microfabricatedfluorescence-activated cell sorter” Nat Biotechnol. 17(11): 11109-11)provide an efficient alternative to traditional FACS devices. Suchsystems are typically microfabricated devices capable of flowing,detecting and sorting cells from a microfluidic stream. Such systems canhave several advantages over traditional FACS in that they allow forreversible fluid flow, extraordinarily high sorting accuracy, parallelsorting of multiple samples and the sorting of particles which are belowthe limit of conventional FACS devices. (e.g. bacteria, phage,phagemids, sub-microparticles, and the like).

[0444] In addition, a variety of powerful particle and cell screeningmethods are available by use of quantitative (e.g. digital) imaging inassociation with visible or fluorescent microscopy. In such methods, alibrary of cells producing quantifiable emission(s) are distributed on asurface in such a way as to maintain a reasonably fixed positions.Visualization and quantification of emission of light from each particleor specified sub-area (as in a grid) is conducted by one of a variety ofavailable of microscopic devices operatively linked to and digitalimaging camera. Optionally, these components may be linked to a computeror other high-speed computational device equipped with software capableof correcting for lens curvature, unequal background within the field ofview, and the like. Such imaging hardware and software can be used toguide (manually or electronically) the selective ‘picking’ or removal ofparticles from the surface. Such particles are then processed,characterized and arrayed as described elsewhere within this disclosure.Particularly useful for the selective ‘picking’ of particles from asurface are micro-manipulation tools such as capillary-actuated orsuction-actuated clamping devices, such as find use in ion channel andpatch clamp studies, optical and atomic tweezers, micropipets andsyringes, and the like.

[0445] D. Product Deconvolution

[0446] During operation of the device, the array of reaction mixturesproduces an array of reaction mixture products (e.g., biologicallyactive nucleic acids or proteins). These biologically active nucleicacids or proteins are screened for at least one property to identifycoding nucleic acids of interest. Thus, in one significant aspect, thedevice or integrated system herein has one or more productidentification or purification modules. These productidentification/purification modules identify and/or purify one or moremembers of the array of reaction mixture products.

[0447] Common methods of assaying for product activity include any ofthose available in the art, including enzyme and/or substrate assays,cell-based assays, reporter gene expression, second messenger inductionor signaling, etc.

[0448] In addition to product identification or purification, productidentification or purification modules can also include an instructionset for discriminating between members of the array of reaction productsbased upon detectable characteristics, such as a physical characteristicof the products, an activity of the products or reactants, andconcentrations of the products or reactants. For example “hit picking”software is available which permits the user to select criteria toidentify members of an array that display one or more activity which issufficient to be of interest for further analysis.

[0449] The product identification module can include detection and/orselection modules which facilitate detection or selection of arraymembers. Such modules can include, e.g., an array reader which detectsone or more member of the array of reaction products. Array readers arecommercially available, generally constituting a microscope or CCD and acomputer with appropriate software for identifying or recordinginformation. In particular, array readers which are designed tointerface with standard microtiter trays and other common array systemsare commercially available. In addition to product manufacturerinformation from many of the various product manufacturers noted herein,detection protocols and systems are well known. For example, basicbioluminescence methods and detection methods which describe e.g.,detection methods include LaRossa Ed. (1998) Bioluminescence Methods andProtocols: Methods in Molecular Biology Vol. 102, Humana Press, Towata,N.J. Basic Light microscopy methods, including digital image processingis described, e.g., in Shotton (ed) (1993) Electronic Light Microscopy:Techniques in Modem Biomedical Microscopy Wiley-Liss, Inc. New York,N.Y. Fluorescence Microscopy methods are described, e.g., in Hergman(1998) Fluorescence Microscopy Bios Scientific Publishers, Oxford,England. Specialized imaging instruments and methods for screening largenumbers of images have also been described, e.g., “MICROCOLONY IMAGERINSTRUMENT FOR SCREENING CELLS EXPRESSING MUTAGENIZED ENZYMES” U.S. Pat.No. 5,914,245 to Bylina et al.; “ABSORBTION SPECTRA DETERMINATION METHODFOR HIGH RESOLUTION IMAGING MICROSCOPE . . . ” U.S. Pat. No. 5,859,700to Yang; “CALIBRATION OF FLUORESCENCE RESONANCE ENERGY IN MICROSCOPY . .. ” WO 9855026 (Bylina et al.); “OPTICAL INSTRUMENT HAVING A VARIABLEOPTICAL FILTER” Yang and Youvan U.S. Pat. No. 5,852,498; Youvan (1999)“Imaging Spectroscopy and Solid Phase Screening” IBC World Congress onEnzyme Technologies and http://www.kairos.com/. These systems can beincorporated into the present invention to provide high-throughputscreening systems.

[0450] Similarly, such modules can include any of: an enzyme whichconverts one or more member of the array of reaction products into oneor more detectable products; a substrate which is converted by the oneor more member of the array of reaction products into one or moredetectable products; a cell which produces a detectable signal uponincubation with the one or more member of the array of reactionproducts; a reporter gene which is induced by one or more member of thearray of reaction products; a promoter which is induced by one or moremember of the array of reaction products, which promoter directsexpression of one or more detectable products; an enzyme or receptorcascade which is induced by the one or more member of the array ofreaction products or the like.

[0451] Further, where a non-standard array format is used, or werenon-standard assays are to be detected by the array reader, commondetector elements can be used to form an appropriate array reader. Forexample, common detectors include, e.g., spectrophotometers, fluorescentdetectors, microscopes (e.g., for fluorescent microscopy), CCD arrays,scintillation counting devices, pH detectors, calorimetry detectors,photodiodes, cameras, film, and the like, as well as combinationsthereof. Examples of suitable detectors are widely available from avariety of commercial sources known to persons of skill.

[0452] Signals are preferably monitored by the array reader, e.g., usingan optical detection system. For example, fluorescence based signals aretypically monitored using, e.g., in laser activated fluorescencedetection systems which employ a laser light source at an appropriatewavelength for activating the fluorescent indicator within the system.Fluorescence is then detected using an appropriate detector element,e.g., a photomultiplier tube (PMT), CCD, microscope, or the like.Similarly, for screens employing colorometric signals,spectrophotometric detection systems are employed which detect a lightsource at the sample and provide a measurement of absorbance ortransmissivity of the sample. See also, The Photonics Design andApplications Handbook, books 1, 2, 3 and 4, published annually by LaurinPublishing Co., Berkshire Common, P.O. Box 1146, Pittsfield, Mass. forcommon sources for optical components.

[0453] In alternative aspects, the array reader comprises non-opticaldetectors or sensors for detecting a particular characteristic of thesystem. Such sensors optionally include temperature sensors (useful,e.g., when a product produces or absorbs heat in a reaction, or when thereaction involves cycles of heat as in PCR or LCR), conductivity,potentiometric (pH, ions), amperometric (for compounds that can beoxidized or reduced, e.g., O₂, H₂O₂, I₂, oxidizable/reducible organiccompounds, and the like), mass (mass spectrometry), plasmon resonance(SPR/BIACORE), chromatography detectors (e.g., GC) and the like.

[0454] For example, pH indicators which indicate pH effects ofreceptor-ligand binding can be incorporated into the array reader, whereslight pH changes resulting from binding can be detected. See also,Weaver, et al., Bio/Technology (1988) 6: 1084-1089.

[0455] As noted, one conventional system carries light from a specimenfield to a CCD camera. A CCD camera includes an array of pictureelements (pixels). The light from the specimen is imaged on the CCD.Particular pixels corresponding to regions of the substrate are sampledto obtain light intensity readings for each position. Multiple positionsare processed in parallel and the time required for inquiring as to theintensity of light from each position is reduced. Many other suitabledetection systems are known to one of skill.

[0456] Data obtained (and, optionally, recorded) by the detection deviceis typically processed, e.g., by digitizing image data and storing andanalyzing the image in a computer system. A variety of commerciallyavailable peripheral equipment and software is available for digitizing,storing and analyzing a signal or image. A computer is commonly used totransform signals from the detection device into sequence information,reaction rates, or the like. Software for determining reaction rates ormonitoring formation of products, are available or can easily beconstructed by one of skill using a standard programming language suchas Visualbasic, Fortran, Basic, Java, or the like, or can even beprogrammed into simple end-user applications such as excel or Access.Any controller or computer optionally includes a monitor which is oftena cathode ray tube (“CRT”) display, a flat panel display (e.g., activematrix liquid crystal display, liquid crystal display), or others.Computer circuitry is often placed in a box which includes numerousintegrated circuit chips, such as a microprocessor, memory, interfacecircuits, and others. The box also optionally includes a hard diskdrive, a floppy disk drive, a high capacity removable drive, and otherelements. Inputting devices such as a keyboard, mouse or touch screenoptionally provide for input from a user.

[0457] In addition to array readers, the product deconvolution modulecan include enzymes which convert one or more member of the array ofreaction products into one or more detectable products, or substrateswhich are converted by the array of reaction products into one or moredetectable products, or other features that provide for detection ofproduct activity by direct or indirect detection formats. For example,the module can include cells which produce a detectable signal uponincubation with members of the array of reaction products, and reportergenes which are induced by one or more member of the array of reactionproducts. Similarly, the module can include promoters which are inducedby one or more array member and, e.g., which direct expression of one ormore detectable products. Enzyme or receptor cascades can be triggeredwhich are induced by the one or more member of the array of reactionproducts, with any of the products of the cascade serving as adetectable event.

[0458] Any available system for detecting proteins or nucleic acids orother expression products (directly or indirectly) can be incorporatedinto the module. Common product identification or purification elementsinclude size/charge-based electrophoretic separation units such as gelsand capillary-based polymeric solutions, as well as affinity matrices,liposomes, microemulsions, microdroplets, plasmon resonance detectors(e.g., BIACOREs), GC detectors, epifluorescence detectors, fluorescencedetectors, fluorescent arrays, CCDs, optical sensors (e.g., anultraviolet or visible light sensor), FACS detectors, temperaturesensors, mass spectrometers, stereo-specific product detectors, coupledH₂O₂ detection systems, enzymes, enzyme substrates, Elisa reagents orother antibody-mediated detection components (e.g., an antibody or anantigen), mass spectroscopy, or the like. The particular system to beused depends on the system at issue, the throughput desired andavailable equipment.

[0459] In selected embodiments, the product identification orpurification modules include one or more of: a gel, a polymericsolution, a liposome, a microemulsions, a microdroplet, an affinitymatrix, a plasmon resonance detector, a BIACORE, a GC detector, anultraviolet or visible light sensor, an epifluorescence detector, afluorescence detector, a fluorescent array, a CCD, a digital imager, ascanner, a confocal imaging device, an optical sensor, a FACS detector,a micro-FACS unit, a temperature sensor, a mass spectrometer, astereo-specific product detector, an Elisa reagent, an enzyme, an enzymesubstrate an antibody, an antigen, mass spectroscopy, a refractive indexdetector, a polarimeter, a pH detector, a pH-stat device, an ionselective sensor, a calorimeter, a film, a radiation sensor, a Geigercounter, a scintillation counter, a particle counter, or an H₂O₂detection system.

[0460] The product detection module can also include a substrateaddition module which adds one or more substrate to a plurality ofmembers of the product array or the secondary product array, e.g., wherethe product has an activity on the substrate. In this embodiment, thedevice will include a substrate conversion detector which monitorsformation of a secondary product produced by contact between thesubstrate and one or more products. Formation of the product can bemonitored directly or indirectly, or formation can be monitored bymonitoring the substrate directly or indirectly (e.g., formation of theproduct can be monitored by monitoring loss of the substrate over time).Primary or secondary product formation can be monitored chemo-, regio-or stereoselectively, or non-selectively.

[0461] Formation of the secondary product can be monitored by detectingformation of peroxide, heat, entropy, changes in mass, charge,fluorescence, luminescence, epifluorescence, absorbance or any of theother techniques previously noted in the context of primary product orproduct activity detection which result from contact between thesubstrate and the product.

[0462] Commonly, the product detector will be a protein detector and thepurification module will include protein purification means such asthose noted for product purification generally. However, nucleic acidscan also be products of the array, and can be similarly detected.

[0463] Array members can be moved into proximity to the productidentification module, or vice versa. For example, the productidentification module can perform an xyz translation of either theidentification module or the array (e.g., by conventional robotics asset forth herein), thereby moving the product identification moduleproximal to the array of reaction products. Similarly, the one or morereaction product array members can be flowed into proximity to theproduct identification module. In-line or off-line purification systemscan purify the one or more reaction product array members fromassociated materials.

[0464] Commonly detected products include one or more polypeptide orpolypeptide activity, one or more nucleic acid, one or more catalyticRNA, or one or more biologically active RNA or other nucleic acid(ribozyme, aptamer, anti-sense RNA, etc.).

[0465] As noted supra, the present invention provides for arrayduplication. For example, secondary product arrays can be produced byre-arraying members of the reaction product array at a selectedconcentration of product members in the secondary product array. Theselected concentration can be approximately the same for a plurality ofproduct members in the secondary product array (sometimes all of thearray members are plated at the same concentration, but it is alsopossible to plate members at different concentrations to providemulti-concentration datapoints, e.g., for kinetic analysis). Thisnormalization of concentration simplifies analysis by the productdetection module.

[0466] Further details on array copy systems, including copying ofproduct arrays are found supra.

[0467] In addition to (or in place of ) actually re-arraying materials,the detection module (or a separate module) can include an instructionset for determining a correction factor which accounts for variation inproduct concentration at different positions in the relevant array. Forexample, where product concentrations are known, a concentrationdependent correction can be applied to correct observed activity data.

[0468] Example: High Throughput Quantitation of Ligand ConcentrationsUsing Surface Plasmon Resonance

[0469] Selective molecular breeding utilizes the ability to measure thebiological activities of libraries of shuffled gene products.Quantitative or semi-quantitative high throughput (HTP) screening isused to rank clones with respect to biological activity during eachround of shuffling. Automation of this process is useful for decreasingthe cost and increasing the speed with which one could do cycles ofshuffling and screening.

[0470] A common problem with quantitation of libraries of shuffledproteins is that the proteins are expressed at relatively low levels(typically 1 ng to 1 microgram per ml) and in crude mixtures such asbacterial extracts, mammalian transfection supernatants, in vitrotranslation reactions, etc. The potentially small amounts of theexpressed protein relative to the other components in the expressionsystem can make quantitation challenging.

[0471] Surface plasmon resonance (SPR) is an established technique formeasuring receptor-ligand interaction kinetics. See, e.g., Nieba et al.(1997) “BIACORE analysis of histadine-tagged proteins using a chelatingNTA sensor chip” Anal. Biochem. 22(2): 217-218; Muller et al. (1998)“Tandem Immobilized Metal Ion Affinity Chromatography/Immunoaffinitypurification of His-tagged proteins—evaluation of two anti-His-tagmonoclonal antibodies” Anal Biochem. 259(1): 54-61; Linder et al. (1997)“Specific Detection of His-tagged Proteins with Recombinant anti-His tagscFv-Phosphatase or scFv-phage fusions” Biotechniques 22(1): 140-149.SPR allows one to measure these kinetics in the presence of complexmixtures such as are present in expression supernatants. If all proteinsin a given library are tagged with an “equivalent” epitope tag and if astandard curve is established with an SPR probe, then one can derive theconcentration of an unknown tagged protein in a complex supernatant byobserving the kinetics of association with an immobilized antibody tothe tag.

[0472] Surface plasmon resonance (SPR) has been widely exploited tomeasure the kinetics of a soluble ligand with a cognate receptorimmobilized on a surface that is suitable for SPR analysis. Thistechnique is very sensitive (one can easily measure ligands at nanomolarconcentrations) and can be performed in the presence of complex mixturessuch as are typically present in recombinant protein expressionsupernatants. The technique measures the kinetics of association anddissociation of the ligand:receptor pair. Given a standard curve, onecan use kinetic measurements or equilibrium binding values to estimateabsolute concentrations of unknown protein samples which have a constantligand, such as an epitope tag, that can interact with a receptorimmobilized on the sensor.

[0473] Preferably, SPR instruments are interfaced with robotic liquidhandling apparatus and the detectors are multiplexed so that they can beused in 96-well formats. Although this example focuses on parallel96-(or other) well SPR formats, a variant approach is to have one (or afew) SPR probe that are sequentially dipped into wells to seriallymeasure protein concentrations in each well. This can be achieved bymoving the probe from well to well (with a regeneration step in between)or by moving the plate on a movable stage so that wells are sequentiallydelivered to the probe.

[0474] This example, schematized in FIG. 18, provides for theconstruction of a microtiter tray compatible SPR device. SPR probe 18-1is connected by fiber optic cables 18-2 to amplifier/detector 18-3. A 96(or other)-well array (18-4) of SPR probes is fabricated with ananti-epitope tag (an epitope is attached to proteins in the library)antibody conjugated to the surfaces of each of the SPR probes. The probearray is dipped into a plate containing, e.g., 96 unknown epitope taggedproteins (for a 96 well format) at unknown concentrations. Incidentlight is beamed from a source, down fiber optic cables to probes. Thereflected light is then piped from the probe back to the amplifier whereit is quantitated. The fraction of incident light that is reflected issensitive to the refractive index difference between the probe and thematerial at the interface between the probe and the unknown solution.Specific binding of protein to the epitope tag increases the local indexof refraction and this can be read out as a perturbation in the amountof incident light that is reflected. The probes can be standardized(shown as 1 μg/ml, 10 μg/ml and 100 μg/ml curves) against solutionscontaining known concentrations of epitope tagged proteins. Thestandardized probes are then dipped into the microtiter plate of unknownexpression system components. The kinetics of association of theexpressed proteins with the antibody on the SPR probe are measured andthe concentrations of tagged protein in the unknown samples iscalculated by comparison with the standard curve.

[0475] In addition to SPR, other approaches to protein detection canalso be used. For example, the in vitro translated protein of interestcan be a fusion protein comprising a fluorescent or luminescent moietysuch as a GFP protein. The amount of translated protein is proportionalto the level of, e.g., GFP fluorescence and can be read by optical orspectroscopic methods.

[0476] Similarly, an epitope tag can be added as an invariant portion ofany library (e.g., any shuffled library). A fluorescently labeledantibody to the tag is added to the translation mix and allowed to bind.Either this binding changes fluorescence, e.g., by FRETquenching/dequenching or an on line separation of antibody and proteinis achieved by parallel capillary electrophoresis (e.g., in amicrofluidic chip format).

[0477] In one embodiment, a specific invariant amino acid sequence isadded to the library of shuffled proteins that encode an alpha helixwhich contains 4 Cysteine residues in a tetrahedral array. FlAsH isadded to the solution and binds to the epitope with a correspondingincrease in fluorescence. There is no fluorescence background and so noseparation is required. See also, Tsien et al (1999) “Target ProteinSequences for Binding of Synthetic Biarsenical Molecules” WO 9921013 A1.

[0478] E. Array Correspondence/Secondary Diversification Module

[0479] The system optionally includes an array correspondence modulewhich identifies, determines or records the location of an identifiedproduct in the array of reaction mixture products which is identified bythe one or more product identification modules. The array correspondencemodule can also determine or record the location of at least a firstnucleic acid member of an array, or a duplicate thereof, or of anamplified duplicate array, where the member corresponds to the locationof one or more member of the array of reaction products. Most commonly,this correspondence module takes the form of a digital system having aquery function, and, e.g., a look-up table that records thecorrespondence information across two or more arrays. For example, thequery function can act on a user input to determine correspondence ofarray members in the look-up table, or the system can be configuredautomatically to assess correspondence of any array member which meets aselected criteria (e.g., activity determined by the product detectionmodule). Such correspondence modules can easily be programmed usingavailable database or spreadsheet programs such as Microsoft Access™,Microsoft Excel™, Paradox™, Quattro Pro™, or any other availablespreadsheet/database program.

[0480] This correspondence system can include a one or more secondaryselection module which selects at least one array member as a substratefor a further diversification reaction (e.g., by shuffling). Theselection is based upon the location of a product identified by theproduct identification modules and the corresponding location of thecorresponding nucleic acid array member identified by the arraycorrespondence member.

[0481] In shuffling embodiments, the secondary selection systemoptionally includes a secondary recombination element which physicallycontacts members of the starting arrays of nucleic acids, or duplicatesor amplicons thereof, to each other or to additional sources of nucleicacids, thereby permitting physical recombination between the first andadditional members. In other aspects, all or part of the recombinationis performed in silico, and no physical contact is required forrecombination (or other diversity generating reactions).

[0482] a) Laboratory Information Management System

[0483] In general, data tracking can provide maintenance of theassociations between array elements and results which correlate to thearray elements. For example, sets of results on projects can includeassociation of three relationships:

[0484] 1. Array member ID—Data Sample ID;

[0485] 2. Data Sample ID—Data Values;

[0486] 3. Data Values—Processed Results.

[0487] Relationship 1 includes the association of array member nameswith the identifiers of tested samples (e.g., “Plate 1, well A-4”).Relationship 2 includes the association of device data output with thetested samples. Relationship 3 includes the association of device outputvalues with results.

[0488] In order to utilize systems and devices herein, an integratedsample tracking process can be used based on commercially available LIMS(Laboratory Information Management System) products. As each sample goesthrough many different formats (pooling, deconvolution, dilution, hitpicking, assorted assay formats, etc.) it is useful to have a veryflexible LIMS to capture that distribution of formats of parentalsamples and subsequent progeny samples. The generated data for eachsample is subsequently integrated with each format and accessible forthe user in conjunction with the samples' “pedigree.” The data isdisplayed through any one of many commercially available data analysissoftware such as SpotFire or ActivityBase to allow monitoring of theprocess.

[0489] For all data-generating devices, the output data can beassociated with the sample ID. In other words, each data point can beassociated with the well analyzed. This is relatively simple for mostsystems designed to scan microplates, such as plate readers, but can bemore complex for systems where the analytes are sampled from theircontainer, such as in mass spectrometry and HPLC. Where necessary,custom software is used to link data output to sample ID and output theresulting table to the database in a standard format.

[0490] HTP screening generates huge amounts of data, which is preferablystored in an organized way. Where the amount of data is too large foreasy storage on data servers, a system for data archival and retrievalis also incorporated. The system can include, e.g., a table that tracksdatafiles (which can be, e.g., data folders), based on, e.g., name andID. The table has a column to store both a current location (such as ahard disk), e.g., in URL format, and a location on a backup disk. Backupdisks (CD/DVD) themselves have an ID which can be tracked. Archiving canbe done automatically, e.g., based on acquisition date or by usertriggering. Backed up files are retained on the server and flagged. Oncea backup takes place, the user can delete the file from the server.

[0491] F. Elements for Arraying and Handling Fluids in the Device

[0492] There are a number of common elements to the integrated systemsherein which form a “backbone” for the device. For example, the deviceincludes array elements, liquid handling elements, robotics (e.g., formoving microtiter plates) and the like.

[0493] (1.) Liquid Handler

[0494] The reactant arrays of the invention can be either physical orlogical in nature. For the generation of common arrangements involvingfluid transfer to or from microtiter plates, a fluid handling station isused. Several “off the shelf” fluid handling stations for performingsuch transfers are commercially available, including e.g., the Zymatesystems from Zymark Corporation (Zymark Center, Hopkinton, Mass.;http://www.zymark.com/) and other stations which utilize automaticpipettors, e.g., in conjunction with the robotics for plate movement(e.g., the ORCA® robot, which is used in a variety of laboratory systemsavailable, e.g., from Beckman Coulter, Inc. (Fullerton, Calif.).

[0495] In an alternate embodiment, fluid handling is performed inmicrochips, e.g., involving transfer of materials from microwell platesor other wells through microchannels on the chips to destination sites(microchannel regions, wells, chambers or the like). Commerciallyavailable microfluidic systems include those fromHewlett-Packard/Agilent Technologies (e.g., the HP2100 bioanalyzer) andthe Caliper High Throughput Screening System (see, e.g.,http://www.calipertech.com/products/index.htm). The Caliper HighThroughput Screening System provides an interface between standardlibrary formats and chip technologies (see, e.g.,http://www.calipertech.com). Furthermore, the patent and technicalliterature includes examples of microtluidic systems which can interfacedirectly with microwell plates for fluid handling.

[0496] Thus, generally, microfluidic systems are commercially available.In addition, university groups such as Mark Burns' research group at TheUniversity of Michigan also describe various microfluidic systems(http://dow3029-mac5.engin.umich.edu/;http://www.engin.umich.edu/dept/cheme/people/burns.html;http://dow3029-mac5.engin.umich.edu/). Accordingly, general fabricationprinciples and the use of various microfluidic systems is known and canbe applied to the integrated systems of the present invention.

[0497] (2.) Array Configurations

[0498] Any of a variety of array configurations can be used in thesystems herein. One common array format for use in the modules herein isa microtiter plate array, in which the array is embodied in the wells ofa microtiter tray. Such trays are commercially available and can beordered in a variety of well sizes and numbers of wells per tray, aswell as with any of a variety of functionalized surfaces for binding ofassay or array components. Common trays include the ubiquitous 96 wellplate, with 384 and 1536 well plates also in common use.

[0499] In addition to liquid phase arrays, components can be stored insolid phase arrays. These arrays fix materials in a spatially accessiblepattern (e.g., a grid of rows and columns) onto a solid substrate suchas a membrane (e.g., nylon or nitrocellulose), a polymer or ceramicsurface, a glass or modified silica surface, a metal surface, or thelike. Components can be accessed, e.g., by local rehydration (e.g.,using a pipette or other fluid handling element) and fluidic transfer,or by scraping the array or cutting out sites of interest on the array.

[0500] While arrays are most often thought of as physical elements witha specified spatial-physical relationship, the present invention canalso make use of “logical” arrays, which do not have a straightforwardspatial organization. For example, a computer system can be used totrack the location of one or several components of interest which arelocated in or on physically disparate components. The computer systemcreates a logical array by providing a “look-up” table of the physicallocation of array members. Thus, even components in motion can be partof a logical array, as long as the members of the array can be specifiedand located.

[0501] G. DNA shuffling on solid supports

[0502] For clarity, much of the preceding discussion describes the useof liquid phase arrays such as those utilizing microtiter tray formats.However, as noted throughout, solid phase arrays represent analternative and also preferred format for performing many operations ofthe systems herein. The following is a description of exemplarysolid-phase shuffling formats.

[0503] As noted, DNA shuffling is a very powerful technique to generatediverse gene libraries from known gene family members through acombination of recombination, mutagenesis and selection. Current DNAshuffling methods can use primerless PCR assembly, where fragments ofgenes reassemble based upon the kinetics of oligo re-annealing, whichare then extended by DNA polymerase in the presence of dNTPs.

[0504] A modification of this DNA shuffling process is performed whereoligo annealing and extension by DNA polymerase proceed while theoligonucleotide, or alternatively, the single-stranded templatepolynucleotide is tethered to a solid support (or substrate). The methodbelow offers advantages to traditional solution based assembly in thatassembly occurs sequentially. Therefore, the specific fragments added ateach step can be more tightly controlled than solution based assembly.Also, this embodiment optionally combines the assembly and rescue steps,reducing the complexity of the overall shuffling process. This newapproach provides novel shuffling methods that utilize technologysimilar to the combinatorial synthesis of peptides and small molecules.

[0505] For example, one may create shuffled libraries by startingassembly using an oligonucleotide(s) that is/are tethered to a solidsupport. The process typically involves tethering the oligonucleotide(s)to a solid support so that at least about 10-20 nucleotides includingthe 3′ hydroxyl are exposed to solvent. In some embodiments, asynthesizer module is used to synthesize one or more nucleic acidfragment on a solid support. Such fragments are optionally created fromone or more parental nucleic acids sequences by a computer operablycoupled to the synthesizer module.

[0506] In any case, the oligo(s) are then typically annealed to mixturesof single stranded nucleic acid generated, e.g., by the processesdiscussed herein, for example, partial DNAse digestion of either PCRproducts of several related genes or genomic or cDNA from homologues ofinterest. The annealed hybrids are extended, typically with DNApolymerase (for example, with a thermostable DNA polymerase such as TaqDNA polymerase), generating a bound library of extended, solid-supporttethered double stranded duplexes. The bound library is denatured torelease the second strand. The tethered oligo is reannealed to thereleased library of DNAse treated fragments and extended. This processis repeated until fragments of desired length are formed. The library ofshuffled products is released from the solid support and used asdesired, e.g., for in vitro transcription translation or cloning intovectors.

[0507] At any of these steps, the solid support allows one to purify thereaction products taking advantage of the properties of the solidsupport (for example, the solid support can include magnetic beads thatcan be manipulated by applying a magnetic field.

[0508] One feature of this approach is that by using an oligonucleotideof precise length to tether to the support (for example a 38 nt oligo)one has pre-determined the location of the first chimera (in theexample, it will begin at nucleotide 39). This is true for the firstoligonucleotide. This feature can be useful in keeping parts of thenucleic acid constant, e.g., for cloning purposes or where a feature isnot desired to be diversified.

[0509] One can use this feature in (at least) two ways. First, if thegenes are cloned into a similar vector, the first oligo can anneal tovector sequence (for example immediately adjacent to the gene codingregion). In this way, the entirety of new gene combinations aresynthesized from DNA fragments with randomly generated ends (e.g., fromDNAse treatment), but the vector sequence is kept constant for cloningpurposes.

[0510] Where one desires to eliminate this feature (where allnucleotides are to be varied for diversity generation purposes), one cantether a mixture of oligonucleotides of varying length to the support(for example, oligos from 35-50 nucleotides give chimeras starting inrange of nt36 to nt51), or one can vary the sequences of the tetheredoligonucleotides to vary this region, e.g., according to the various insilico and oligonucleotide-mediated methods discussed above.

[0511] In typical DNA shuffling, extension of DNAse fragments occurs atany place annealing occurs. In contrast, tethering the oligo to solidsupports likely restricts the choice of oligo to those at the ends ofthe DNA of interest (although one can tether using oligos designed toregions internal to the gene of interest, ultimately the entire DNA ofinterest is usually, though not always, re-assembled, e.g., to generatea full length, or substantially full length, heterolog).

[0512] The addition of DNA fragments to the tethered oligonucleotide istypically sequential. The assembly process can be paused at any step andconditions changed. For example, one can add or subtract gene fragmentsduring the assembly. For example, one can start the assembly with genes1, 2, and 3, but remove gene 1 after initial round. Similarly,particular blends of genes can be selected at any stage to biasrecombination (at any stage) towards one or more parental type. Forexample, one can change from genes 1-4 to only genes 1 and 4 after 5extensions; or alter the representation of any gene in the recombinationprocess, e.g., change gene 1, e.g., from 1:4 to 1:2 for the last 3extensions to bias the recombination, e.g., to achieve selectable geneblending. Alternatively, one can alter PCR conditions for parts of theassembly, e.g., longer extensions at the 3′ end. This provides animproved level of control over the progress and outcome of shufflingexperiments. For example, one can add DNAse fragments corresponding tothe 5′ end of genes separately from fragments corresponding to the 3′end.

[0513] An additional feature of the invention is that assembly andrescue can occur simultaneously. Also, the sequential nature of theaddition of DNA allows for combinatorial DNA shuffling.

[0514] DNA shuffling can also be conducted on multiple genes in parallelin a single reaction pot. For example, DNA hybridization is a discreteprocess; under stringent conditions, oligos from gene A will onlyrecognize DNA from gene A or related sequences, and ‘ignore’ oligos ofnon-gene A sequences. Assuming that gene A is unrelated to gene B, onecan mix solid supports containing oligos from gene A and gene B, and mixthem simultaneously with the DNAse treated fragments. Thus, severalgenes can be shuffled simultaneously, in the same reaction vessel.

[0515] As noted, solid phase shuffling provides several advantages. Itis worth noting certain additional advantages. For example, solid phasesynthesis of nucleic acids, proteins and other relevant components isstraightforward, simplifying automation processes. Similarly, tetheringoptionally utilizes the attachment of oligos to gene chips, acommercially available technology platform (e.g., from Affymetrix, SantaClara, Calif.). One may generate gene chips for shuffling or otherdiversity generation reactions.

[0516] Further, since the addition of DNA to the tether (assembly) isstepwise, this step by step process can be controlled (i.e. the reactioncan be stopped at any point and conditions changed, such as temperature,salt, extension time, etc).

[0517] One can include RNA polymerase promoters on oligos used in theassembly (i.e., an oligo 5′ to the coding region), and therebytranscribe RNA in vitro from the solid support linked gene libraries.Since one can transcribe RNA in vitro from these libraries, one can alsotranslate in vitro to directly generate libraries of proteins withoutcloning. Even if yields of proteins from in vitro translation are low,the translation nonetheless allows very fast screening methods to beemployed. Even low levels of expression are sufficient for a variety ofmethods such as antibody-based screening methods (e.g., ELISA) andenzyme-based detection assays in which signal is amplified in the assayprocess.

[0518] Because tethered DNA is easily purified, libraries can bepre-screened prior to cloning, to select for certain traits, or toselect against certain traits (for example hybridization to a gene ofinterest, or lack of hybridization to the gene of interest), e.g., usingappropriate gene chips.

[0519] Finally, the technology of using tethered molecules offersadvantages in library tracking and cataloging.

[0520] Methods to purify only desired shuffled genes can be employed.For example, it is often advantageous to purify only those shuffledgenes that are full-length (partial sequences are often less likely tobe active). For example, one can synthesize a shuffled library with atethered oligo that lies 3′ to the gene of interest, using an oligo thatincorporates a promoter for an RNA polymerase (eg. T7 RNA polymerase) 5′to the coding region in the assembly process. RNA is transcribed usingT7 polymerase. The resulting sample is treated with nuclease thatdestroys single stranded DNA but protects RNA/DNA hybrids (for e.g. S1or Mung bean nuclease). DNA still linked to the solid support ispurified. The sample is heated, or RNAse treated to remove RNA. An oligothat anneals to sequence near the 5′ end of the gene (internal to T7polymerase promoter, but 5′ to region of interest) is hybridized. Thesingle stranded DNA product is extended using DNA polymerase to give adouble stranded product. The materials is removed from solid support andcloned, or is in vitro transcribed (in place or in another reactionvessel).

[0521] Tethering methods include: chemical tethering, biotin-mediatedbinding, cross-linking to the solid support matrix (e.g., U.V., orflorescence activated cross-linking) and the use of ‘soluble’ matrix,such as PEG, which can be precipitated by ETOH or other solvents torecover bound material (see Wentworth, P., 1999, TIDTECH 17: 448-452).

[0522] (1.) Combinatorial Shuffling Using Solid Supports

[0523] By performing diversity generation reactions such as shuffling onsolid supports, the variation accumulated in such experiments can becontrolled. By using oligos linked to solid supports as outlined above,one can perform sequential additions of DNA by annealing and extension.

[0524] In one specific embodiment, this process is performed by: (1) foreach family member, PCR amplifying the region of interest, digestingwith Dnase, and isolating fragments. (2) Placing Dnased fragments foreach gene in a separate ‘cup’ (i.e., a cup for gene A, a cup for gene B,a cup for gene C, a cup for gene D). Each cup contains DNA fragmentsrepresenting the whole of each gene, but each gene has its own cup. (3)In the first step, a single stranded oligonucleotide linked to a solidsupport, (with 10-30 bp of accessible DNA, and an exposed 3′ hydroxyl)is divided into several equal fractions (in this example 4 fractions).Each fraction is placed into a separate ‘cup’ of DNA fragments fromeither gene A, B, C, or D. The ‘cups’ are heated to denature any doublestranded hybrids present in each cup, then cooled to allow DNA toanneal. During this annealing, fragments homologous to the solidsupport- linked oligo anneal to this oligo. The annealed products arethen extended with DNA polymerase to yield double stranded product,linked to the solid support (in this example, one fourth of the DNA is a‘cup’ containing gene A sequence, one fourth in a cup containing gene Bsequence, one fourth gene 3, one fourth gene 4; however, an advantage ofthe system is that any ratios of starting genes may be used, e.g., tobias resulting recombinant nucleic acids towards one parent type).Following the ‘extension’ reaction, the double stranded DNA fragmentsare removed by virtue of their solid support linkage (for e.g. magneticbeads), and pooled into one tube (or other container). These hybrids areheated to denature the duplexes, and the unlinked strand washed away.

[0525] In a second round, the newly extended single stranded fragmentsare again randomly divided into pools (in this case 4), and each portionis again placed into one of the available cups (in this case 4 cups, forgenes A, B, C, D). The ‘cups’ are heated to denature any double strandedhybrids present in each cup, then cooled to allow DNA to anneal. Duringthis annealing, fragments homologous to the solid support-linked singlestranded polynucleotide anneal. The annealed products are then extendedwith DNA polymerase to yield double stranded product, linked to thesolid support (in this example, one fourth of the DNA was is a ‘cup’containing gene A sequence, one fourth in a cup containing gene Bsequence, one fourth gene3, one fourth gene 4). Once again the extendedproducts are removed and re-pooled into one container. This container isheated to denature the double stranded duplexes, and the strand unlinkedto the support washed away. The support-linked polynucleotide collectionis now divided once again, and the process repeated.

[0526] After a sufficient number of annealing/extension reactions, thefinal single stranded DNA products can be converted to double strandedDNA by annealing an oligonucleotide internal to the last oligonucleotidecapable of attachment, and extended with DNA polymerase and dNTPs. Thedouble-stranded products are then released from the solid support, andcloned. In order to facilitate cloning, several rounds of PCRamplification may be performed in the tube containing the support linkedoligonucleotide, and this may act as a template for PCR while stillattached to the solid support. Cloning can also be facilitated byincorporating the recognition sequence for one or several restrictionnucleases into the sequence to be incorporated at each end of theassembled gene fragment.

[0527] One can design methods to eliminate support-linked oligos thatfail to extend in any one step, if this is a source of substantialbackground.

[0528] (2.) Shuffling Using a Tethered Single-stranded Template

[0529] As an alternative to tethering oligonucleotide primers to a solidsupport, single-stranded template polynucleotides can be immobilized ona solid support as described above (e.g., by: chemical tethering,biotin-mediated binding, cross-linking to the solid support matrix,etc.). In one preferred embodiment, the template polynucleotides arearrayed by depositing a solution containing the template nucleic acidson a glass slide coated with a polycationic polymer such as polylysineor polyarginine (see, e.g., U.S. Pat. Nos. 5,807,522 and 6,110,426“METHODS FOR FABRICATING MICROARRAYS OF BIOLOGICAL SAMPLES” to Brown andShalon. The template polynucleotide can be either DNA or RNA, or acombination of DNA and RNA. A wide variety of suitable templates exist,and can be selected by the practitioner depending on the specificapplication. For example, desirable template polynucleotides includegenomic and/or expressed (e.g., cDNA) sequences including coding,non-coding, antisense, naturally occurring, artificial, consensus,synthetic and/or substituted (e.g., dUTP substituted DNA) molecules. Insome applications, a population of identical polynucleotides are arrayedon a support. In other applications, templates representing a diversepopulation of polynucleotides are attached to a support. For example,entire genomes, e.g., bacterial or fungal genomes can be arranged in aphysical array on a glass slide or silicon chip. In yet otherapplications, the expression products of a cell, or a subset thereof areaffixed to the support. Such expression products can be RNA or cDNA, andin some cases comprise a library of expression products. The presentinvention is not limited by the choice of template, or the source ofpolynucleotide selected. Such routine selections are based on theparticular application, and will be readily apparent to one of skill inthe art.

[0530] Diversity is introduced by hybridizing single-stranded nucleicacid fragments to the immobilized template polynucleotide. Typically,the nucleic acid fragments will possess regions of sequence similarity(or identity) as well as regions of dissimilarity. In many cases,annealing of multiple complementary (or partially complementary)fragments results in hybridization of partially overlapping fragments tothe immobilized template. A polymerase (e.g., a DNA or RNA polymerasesuch as a thermostable DNA polymerase) is used to extend the annealedprimers generating a heteroduplex made up of the template and asubstantially full-length heterolog complementary (i.e., thathybridizes) to the template nucleic acid. Optionally, the unhybridizedoverhanging regions can be removed, e.g., with a nuclease, prior to orfollowing extension, and/or the gaps between annealed (and extended)fragments joined with a ligase. In some cases, it is desirable to employa nuclease or ligase with polymerase activity. This process isillustrated in FIG. 31, in which a solid phase-bound template ishybridized to appropriate fragments. As shown, the fragments areextended, if desired, unwanted flaps are digested and breaks in theresulting extended nucleic acids sealed with ligase.

[0531] The process can be repeated for multiple cycles by denaturing theheteroduplex and hybridizing the template to a new set (or subset) ofnucleic acid fragments. The recombinant heterologs generated in eachcycle are optionally recovered between successive cycles of denaturationand reannealing. Most typically, recovery relies on amplification,although other methods such as hybridization and/or cloning are alsofeasible. Optionally, the recovered heterolog can be used directly inadditional diversity generating procedures, as described herein and inthe cited references.

[0532] Frequently, recovery is facilitated by incorporating a sequencethat serves as a primer for the amplification reaction within thetemplate or a fragment nucleic acid sequence. For example, the templatecan incorporate recognition sequences for “universal” and “reverse”primers at its 5′ and 3′ ends, respectively. Among the fragmentshybridized to the template are included the corresponding universal andreverse primers. Subsequent amplification of recombinant polynucleotidesthen proceeds according to routine amplification procedures.

[0533] In addition to the commonly used linear sequence primers (such asuniversal and reverse primers), the present invention makes use ofprimer sequences with a specialized secondary structure for facilitatingrecovery of the recombinant heterologs generated by extension offragments annealed to a specified template. For example, a boomerang DNAamplification reaction is primed by a single primer located internal torecombinant heterolog (for example, a conserved region of thetemplate/fragments can be selected for use as a primer binding site). Asillustrated in FIG. 32A, adaptors that assume a hairpin configurationare ligated to the end(s) of the heteroduplex which is optionallyreleased from the solid support. Following denaturation of theheteroduplex, and binding of the internal primer, extension by a DNApolymerase results in extension of a product including sequencesidentical to the heterolog and the template as an inverted repeat.Typically, a restriction enzyme recognition site is incorporated intothe hairpin, permitting separation of the template and heterologsequences.

[0534] Another alternative is to employ a “vectorette.” In thisapproach, amplification occurs between an internal primer and a primerwithin the vectorette, a pair of synthetic oligonucleotides havingregions of duplexed DNA flanking a central mismatched region thatprovides a primer binding site, as illustrated in FIG. 32B. If thetarget nucleic acids are cleaved with a restriction enzyme prior toligation of the vectorette sequence, only restriction fragmentsincluding the internal primer binding site are amplified. A firstextension cycle results in a duplex corresponding to the recombinantheterolog which can be simply amplified using the internal andvectorette primers. A variation of this approach is the “splinkerette,”in which the vectorette incorporates a looped-back hairpin structurethat decreases end-repair priming and reduces non-specific priming.Further details on vectorette use and construction can be found inArnold et al. (1991) “Vectorette PCR: a novel approach to genomicwalking” PCR methods Appl. 1: 39-42 and Hengen (1995) “Vectorette,splinkerette and boomerang DNA amplification” Trends Biochem Sci. 20:372-3.

[0535] As previously described, recombinant nucleic acids produced byhybridization and extension of nucleic acids on an array can further betranslated to provide reaction products suitable for screening.Alternatively, the recombinant heterologs described above can betransformed and expressed in cells to facilitate screening by structuraland/or functional means to identify recombinants with desirableproperties. Typically, but not necessarily, the recombinant nucleicacids are introduced into host cells in a vector, such as an expressionvector. Vectors and cells incorporating recombinant polynucleotidesproduced by the above described recombination on a solid phase supportare also a feature of the invention.

[0536] H. An Example Integrated System for Diversity Generation ViaShuffling

[0537] This example “shuffling machine” is an integrated system whichconverts parent DNA into improved shuffled clones, which are optionallyused as parent DNAs for subsequent shuffling. The machine is based upona set of modules as discussed above that are integrated to improvefunction and throughput.

[0538] The machine performs a number of tasks, using a liquid handlingstation, a PCR system, a fluorescence/absorbance plate reader, aplate/reservoir storage device and a robotic system for shuttling platesbetween the modules. This machine performs the entire shuffling processautomatically in a microtiter plate format.

[0539] For clarity of description, the machine is split into a number ofmodules; however, module functions can be combined in practice tosimplify the overall system. An example schematic of the modules of anintegrated shuffling machine is provided by FIG. 2. The modules includea shuffling module, a library quality assessment module, a dilutionmodule, a protein expression module, and an assay module. Typicalintegrated device elements include thermocyclic components, single andmulti-well liquid handling, plate readers and plate handlers.

[0540] (1.) The Shuffling Module

[0541] This example shuffling module uses a liquid handler, a PCRmachine, a fluorescent plate reader, and a plate/reservoir handling andstorage system to perform an automated shuffling reaction (as noted,shuffling is one preferred diversity generation reaction performed bythe methods and systems herein).

[0542]FIG. 3 provides a schematic representation of the steps performedby this exemplar shuffling module. In particular, a single pot reactionis performed, utilizing uracil incorporation, DNA fragmentation andassembly. A rescue PCR is performed, the results assessed with PicoGreenand any wells that test positive for PicoGreen incorporation are rescuedand sent to the library quality modules.

[0543] As noted, DNA fragmentation is achieved using the uracilincorporation strategy noted above. Different wells of a microtiterplate are set up with different reaction conditions, leading todifferent DNA size fragments and different ratios of parental nucleicacids (the diversity target sequences). The conditions for the uracilfragmentation is defined by the user as are the assembly and rescueprotocols.

[0544] In other embodiments, the conditions and/or protocols arecalculated using a set of computer understandable instructions, e.g.,embodied in a computer or web page operably coupled to the shufflingmodule. Alternatively, the shuffling module is optionally a programmableor programmed module that calculates appropriate conditions, e.g., basedon empirical data, theoretical predictions and/or user input.

[0545] Once the fragmentation is complete (as selected by the user) thefragmented DNA is transferred to a PCR module for the assembly reaction.An aliquot of the assembled DNA is then transferred to a new PCR platefor a rescue PCR reaction using standard primers.

[0546] The success of the shuffling reactions are measured by removingan aliquot from the rescue PCR plate and followed by transfer to a platecontaining Pico green dye.

[0547] Wells that contain double stranded DNA (i.e., give fluorescencewith Pico Green) are collated by the liquid handler, using hit picksoftware, into plate(s) that contain all the shuffled clones, which arepassed on to the library quality module.

[0548] The liquid handler then transfers (and, optionally, mixes orotherwise modifies materials) to make up solutions from solvent/reagentreservoirs, setting out an array of reactants. The information as towhich solutions are plated in which positions in an array is trackedthrough subsequent manipulations in all modules, along with the PCRconditions which are used for amplification.

[0549] Once the rescue PCR is performed, the success of therecombination is assigned based upon the presence of double stranded DNAas measured by Pico Green fluorescence. Full length ds DNA can also beunambiguously identified and quantified by capillary electrophoresis(e.g., in parallel formats similar to a parallel capillaryelectrophoresis sequencer such as MEGABASE or by parallel capillaryelectrophoresis on a chip) with detection by fluorescence. Successfulrecombination leads to predominantly a single full-length species in therescue PCR which is proportional to an arbitrary level of fluorescence.As noted above, Pico green is a quantitative measure of the amount of dsDNA present and this information about the DNA concentration in eachwell is used in the downstream processing modules. The hit pickingsoftware takes the positive wells and converts them to new wellpositions without loss of information. The set of positive wells acrossall of the plates is referred to as a “collated library.”

[0550] Another exemplary shuffling module or diversity generation devicecomprises a programmed thermocycler and fragmentation module operablycoupled to the thermocycler. The programmed thermocycler typicallycomprises a thermocycler operably coupled to a computer comprising oneor more instruction set. In other embodiments, the instruction sets areembodied in a web page or in the thermocycler itself, e.g., a Javaprogram. For example, a network card is optionally added to athermocycler or the internal software of a commercially availablethermocycler is altered to provide the instruction sets described below.

[0551] The instruction sets typically comprise computer understandableinstructions for performing one or more of the following: calculation ofan amount of uracil and an amount of thymidine for use in the programmedthermocycler; calculation of one or more crossover region between two ormore parental nucleotides; calculation of an annealing temperature;calculation of an extension temperature; and/or selection of one or moreparental nucleic acid sequence. These calculations are typically madebased on one or more of: user input, empirical data, and theoreticalpredictions, e.g., of melting temperature. Such melting temperaturepredictions are well known to those of skill in the art. In addition,predictions are also optionally used to calculate the effect ofannealing temperatures on the number of possible crossovers. Typicalinput data include, but are not limited to, parental nucleic acidsequences, desired fragmentation lengths, crossover lengths, extensiontemperatures, and annealing temperatures. Empirical data typicallycomprise comparisons of one or more nucleic acid melting curve ormelting temperature.

[0552] The computer or programmable thermocycler typically calculatespossible crossover regions between parental nucleic acid sequences,depending on the annealing temperature and extension temperatures to beused in the amplification steps. The computer would then set up one ormore cycle for the thermocycler. For example, a cycle in thethermocycler typically includes amplification of one or more parentalnucleic acid sequence, fragmentation of the one or more parental nucleicacid sequence to produce one or more nucleic acid fragments; reassemblyof the one or more nucleic acid fragment to produce one or more shufflednucleic acid; and, amplification of the one or more shuffled nucleicacid. Various robotics and plate handlers are optionally added to thedevice as described herein to transfer nucleic acids between thefragmentation module and the thermocycler.

[0553] In some embodiments, the thermocycler amplifies the variousparental nucleic acids in the presence of uracil and the fragmentationdevice fragments the parental nucleic acids using various uracilcleaving enzymes. The programmable thermocycler in this embodimenttypically directs a pause in the cycle to allow the addition of theenzymes to the reaction mixtures. In addition, the programmedthermocycler is used to calculate the ratio of uracil residues tothymidine residues to produce fragments of a desired mean length orsize. For example, a length that leads to an optimized level ofdiversity in the shuffled nucleic acids is optionally selected.Fragmentation is optionally carried out in the presence of Taq/Pwo andoutside primers so that the fragments are used directly in thereassembly/amplification steps of the cycle with appropriatelycalculated annealing and extension temperatures. Other fragmentationmethods optionally used in a fragmentation module of the invention andoperably coupled to a programmed thermocycler include, but are notlimited to, sonication, DNase II digestion, random primer extension, andthe like.

[0554] In another embodiment, a diversity generation device comprises acomputer, a synthesizer module, e.g., a microarray oligonucleotidesynthesizer such as an ink-jet printer head based oligonucleotidesynthesizer, and a thermocycler. The computer typically comprises atleast a first instruction set for creating one or more nucleic acidfragment sequence from one or more parental nucleic acid sequence. Thesynthesizer module typically synthesizes the one or more nucleic acidfragment sequence created by the computer; and the thermocyclergenerates one or more diverse sequence from the one or more nucleic acidfragment sequence, e.g., by performing an assembly/rescue PCR reactionas described above. For example, the synthesizer optionally synthesizesthe nucleic acids fragments on a solid support as described above, e.g.,using mononucleotide coupling reactions or trinucleotide couplingreactions.

[0555] In addition, the computer optionally comprises additionalinstruction sets, e.g., for determining a set of conditions for thethermocycler, e.g., to perform assembly/rescue PCR reactions.

[0556] For example, sequences, e.g., DNA, RNA, or protein sequences, areentered into a computer, e.g., character strings corresponding to thesequences. The computer is then used to generate a number of smallersequences from which oligonucleotides can be created. These smallersequences typically encode for some or all of the diversity of theoriginal sequences entered. Typically, the instruction sets, e.g., in acomputer, or web page, or both, limit or expand diversity of the one ormore nucleic acid fragment sequence, e.g., a parental nucleic acidsequence, by adding or removing one or more amino acid having similardiversity; selecting a frequently used amino acid at one or morespecific position; using one or more sequence activity calculation;using a calculated overlap with one or more additional oligonucleotide;based on an amount of degeneracy, or based on a melting temperature. Thesequences are then used to drive a synthesizer, e.g., an oligonucleotidesynthesizer, to create a physical manifestation of the sequences, e.g.,on a support medium or solid support. Once the oligonucleotides aresynthesized, the solid support is optionally digested or theoligonucleotides are cleaved from the support, e.g., using thethermocycler. The mix of oligonucleotides is then used in thethermocycler, which creates full length sequences, e.g., shuffledsequences. The computer is also optionally used to determine the bestconditions for assembly/rescue reaction and digestion.

[0557] The above device allows one to generate synthetic shuffled genesstarting with only sequence data in a matter of hours. Combined with ahigh throughput screening device the genes are all optionally createdand screened for desired characteristics in less than a day. Therefore,the devices described above also optionally comprise screening modules,e.g., high-throughput screening modules, for screening the one or morediverse sequence for a desired characteristic. In addition, the computeris optionally used to select the original sequences used to create thefragments for shuffling, as described above.

[0558] The above diversity generation devices are typically used toallow rapid shuffling of nucleic acids to create new and diverse nucleicacids, e.g., enzymes. In some embodiments, the devices are incorporatedinto kits comprising, e.g., the devices, reagents, and appropriateprotocols for shuffling. For example, a kit optionally comprises adiversity generation device as described herein, e.g., comprising apre-programmed PCR machine, and one or more reagent for generatingdiverse nucleic acids. Reagents include, but are not limited to, Ecoli., e.g., a dut-ung strain to make plasmids containing uracil insteadof thymidine, PCR reaction mixtures comprising a mixture of uracil andthymidine, one or more uracil cleaving enzyme, a PCR reaction mixturecomprising standard dNTPs, polymerases, and the like. Possible uracilcleaving enzymes included in the kit are uracil glycosidase, anendonuclease, such as endonuclease IV, and the like. Theuracil/thymidine ratios included with the kit can be optimized toproduce fragments of particular size or the protocols and/or diversitygeneration devices are programmed to calculate the appropriate ratios.Concentrations of dNTPs, Mg and other reagents are also optionallyprovided in optimized formats. In addition, the number of cycles is alsooptionally optimized, e.g., by a programmed thermocycler.

[0559] Polymerases included with the kits are typically thermostablepolymerases, e.g., non-proof reading and proof-reading polymerases. Inaddition, the kits optionally include artificially evolved enzymes,e.g., artificially evolved polymerases that have a higher fidelity ofincorporation for uracil residues, or are more active at 25° C. thanthose presently available.

[0560] The kits and devices above are optionally used to create enentirely automated format for generating diversity, e.g., throughshuffling. In addition, they can be combined in a variety of ways withother components described herein, e.g., to create high throughputshuffling and screening capacity.

[0561] (2.) Library Quality Module

[0562] The library quality module utilizes the liquid handler, the PCRsystem, the Fluorescence Plate Reader and the Plate/reservoir handlingand storage system.

[0563]FIG. 4 provides a schematic overview of a Library Quality Module.In particular, the module divides reactions into multiple plates,performs a crossover assessment, verifies PCR by PicoGreen incorporationand performs a hit pick quality rating.

[0564] The collated shuffled library from the shuffling module arediluted into one or more daughter plate to achieve a standard DNAconcentration. This daughter plate is used as the source plate for DNAtemplates in quality assessment PCR reactions. Each parental DNA servesas the template to design forward and reverse PCR primers. These primersare mixed combinatorially such that recombinants can be detected (e.g.,by mixing forward primer “A” which uniquely recognizes parent “A” withreverse primer “B” which uniquely recognizes parent “B,” etc., coveringall possible combinations of primers, or a desired subset thereof). ThePCR reactions are transferred to a plate for Pico Green quantitation.The collated libraries are ranked with respect to diversity based on thelevel of fluorescence in each reaction and the number of PCR reactionsthat give amplification. The top collated libraries are then(optionally) re-collated to provide diverse collated libraries which arepassed onto the in vitro transcription/translation module, or the hitsare simply passed onto the in vitro transcription/translation module.

[0565] The DNA concentrations determined by the shuffling module is usedto normalize template DNA concentrations in this module. The number ofdifferent PCR reactions run is determined by the number of startingparental sequences and the amount of information desired (e.g.,2^(no of parents−1) reactions gives good information) to determine thebest library. An hypothetical “perfect” library gives the sameamplification rate (and hence fluorescence) in each PCR reaction. Whilethis does not give the number of crossover genes per se, it can be usedto ensure that the there is a diversity of sequences that have at leastone crossover.

[0566] (3.) Dilution Module

[0567] The dilution module uses the liquid handier, the PCR system, thefluorescence plate reader and the plate reservoir handling/storagesystem.

[0568]FIG. 5 provides a schematic overview of the dilution moduleactivities. In particular, DNAs are diluted to the desired number ofcopies per well, PCR amplified, assessed for dsDNA by PicoGreen, andhits are picked.

[0569] The top collated libraries are reamplified, incorporating areporter protein into the library, either as a fusion or as part of atranslationally coupled system. An aliquot of this material is removedfor quantitation and the library is diluted and dispensed intomicrotiter wells at an average concentration of about 1-10 DNAmolecules/well.

[0570] The DNA is amplified by PCR to give enough DNA for efficient invitro transcription/translation (ivTT) and an aliquot is removed forquantitation with Pico Green. The wells where DNA is amplified are thenhit picked into wells ready for transfer to the protein expressionmodule. A number of wells in each plate are filled with standard controlconstructs (e.g., wild type and a negative control) at the sameconcentration as the library clone pools.

[0571] In general, the dilution which gives a concentration of 1-10 DNAmolecules/well is determined from a standard curve. The reporter proteinis chosen to give a construct that efficiently undergoes ivTT for alarge number of systems. This also standardizes the ivTT procedure forall proteins.

[0572] (4.) Protein Expression Module

[0573] The Protein expression module uses the liquid handler, thefluorescent plate reader and the plate/reservoir handling and storagesystem.

[0574]FIG. 6 provides a schematic overview of the activities of theexpression module, i.e., the addition of DNA to cell-free ivTT reactionmixtures to form arrays of reaction mixtures, an assay for aco-translational product as a control, and the picking of hits by thepresence of the co-translational control product.

[0575] The pooled library members are taken from the dilution module andan aliquot is removed in which the DNA concentration is adjusted foroptimal ivTT. The rest of the ivTT mix is then added to the wells andprotein production is initiated. The efficiency of the ivTT reaction ismeasured using the activity of the reporter protein. For example, if thereporter is green fluorescence protein (GFP), then efficiency ismeasured by directly monitoring fluorescence. If the reporter is anenzyme, an aliquot is typically removed for appropriate processing.

[0576] The wells which give efficient protein production are thenrearrayed into new microtiter plates and passed on to the assay module.

[0577] The DNA concentration in each well is determined by the dilutionmodule and therefore the amount of DNA in each well can be normalized toa corrected value for efficient ivTT. The wells which contain thecontrol constructs are tracked so that the activity of the libraryclones can be compared to the initial wild type.

[0578] (5.) Assay Module

[0579] The Assay module uses the liquid handler, afluorescent/colorimetric/luminometer plate reader and theplate/reservoir handling and storage system.

[0580]FIG. 7 provides a schematic overview of the exemplar assay module.In particular, expression mixtures are added to assay reagents (or viceversa) and changes in a detectable marker such as absorbance,fluorescence or luminescence are detected and hits picked. Similarly,the assay module can include an autosampler which interfaces with a CE,MS, GE or other system. SPR (surface plasmon Resonance) can also be usedto measure protein binding. SPA (Signal Proximity Assay) methods canalso be used, e.g., using a luminescence plate reader.

[0581] The protein solutions provided by the protein expression moduleare tested for the properties of interest. The proteins are typicallydiluted to a standard concentration before the assay, using the level ofthe reporter protein as a marker.

[0582] The protein solutions are aliquoted out and assayed using anyformat that leads to a spectrophotometiic change in the properties ofthe assay mix. A majority of proteins may be assayed, directly orindirectly, using such formats (e.g., to monitor changes in pH,production of fluorescent product, loss of turbidity on hydrolysis,coupled assays, etc.).

[0583] Alternatively, the proteins can be assayed using heat productionor oxygen consumption, changes in conductivity (ion production),parallel CE, GC, or the like. These properties of solution are readilyquantified, e.g., using microfabricated devices as discussed above.

[0584] The proteins that are determined to be better than wild-typeaccording to the criteria of the assay are identified and the positionof the clones are determined.

[0585] The proteins are normalized to account for expression artifactsin the ivTT reaction. The activity of both the wild type and negativecontrol clones is measured and used as a measure of the range of theassay. The variation in the controls (standard deviation) determines howsignificant differences are among the hits, as well as providing forstatistical comparisons (e.g., standard average deviations as comparedto wild type, etc.).

[0586] (6.) Deconvolution of Hits and Retesting

[0587] The clone pools can be reconfirmed and deconvoluted by submittingthem to the dilution module. This separated the pool of about 10 clonesinto a few hundred wells, with increased stringency (to about 1molecule/clone per well). The remaining modules then retest eachmolecule one or more times, verifying the previously identifiedactivity. The assay module can also incorporated a secondary assay tofurther verify desired activities.

[0588] (7.) Second Round Shuffling

[0589] The reconfirmed hits are optionally used as substrates insubsequent shuffling reactions, with this process being iteratively (andautomatically) repeated by the various modules of the system, until adesired activity level for the target is obtained.

[0590] (8.) Example Machine Configuration

[0591]FIG. 8 provides an exemplar configuration for a recombination andselection machine, showing plate stacker 801, gantry robot 805,pipetting heads 807, plate gripper 809, plate reader 811, thermocycler813, plate holders 815, solution reservoirs 817 and reagent tubes 819.During operation of the device, plates are transferred from platestacker 801 by plate gripper 809 to plate holders 815 to the variousoperation regions such as thermocycler 813 and plate reader 811. Platesare also optionally transferred back to plate stacker 801. Reagents aretransferred to and from reagent tubes 819 and solution reservoirs 817via pipetting heads 807, which also transfer materials between reagenttubes 819, solution reservoirs 817 and any plates used in the system.

[0592] (9.) Example Miniature Configuration

[0593] In this example, a miniature laboratory system is used, e.g., toperform a shuffling reaction. As shown in FIG. 19, the system includesan appliance and a microfluidic chip which has environmental controllayer 19-1, microfluidics layer 19-2 and support layer 19-3, as well asoptical interface for temperature control 19-4 and power supply 19-5(see also, FIGS. 20-22). In operation, the miniature laboratory systemis used, e.g., in combination with a module that provides reagents andoptimal environmental conditions. Starting materials that are providedinclude DNA (genes/gene fragments, oligonucleotides, etc.), reagents,primers vectors, etc. The product of the system is, e.g., a gene libraryof diversified genes, operons, etc. Additional steps can be included inthe system for additional reactions, if needed. Where purification stepsare desired, membrane filters are optionally positioned in the flowlines, e.g., binding reagents or components that are to be removed. Themicrofluidics system that is used in the miniature laboratory system isused to guide and direct low volume samples containing, e.g., 0.05-100ng/μl of DNA. Using advanced separation systems and DNA reactionchambers, DNA shuffling can be performed in the miniature laboratorysystem.

[0594] As shown in FIG. 19, in one embodiment, a three-layer chipconstruction is used to provide the microfluidic portion of the overallsystem. The bottom layer is for support, the middle layer containschannels that guide DNA and solutions and reagent solutions and the toplayer provides contact points for a power supply and a temperaturecontroller (e.g., operating by conductivity or light). Details regardingthe top layer are found in FIG. 20. Samples are transported through thesystem. e.g., by air pulses or other fluid driving means. Detailsregarding the fluidics layer is set forth in FIG. 21. An appliance (FIG.22) contains the operation hardware (and optionally software) for theminiature laboratory system, including PCR programs, incubation periods,DNA separation and sample product import/export. The appliance alsooptionally interfaces with a computer to provide additional controlfeatures. The complete system provides means to generate libraries ofshuffled genes directly, by supplying starting DNA, reagents,oligonucleotide primers and vectors. The resulting DNA sample isdirectly introduced into, e.g., a cell of choice by transformation,electroporation, conjugation, particle bombardment, injection, etc.

[0595] I. Example DNA Shuffling Machine (AlternateEmbodiment)—Comparison of Alternate Breeding Strategies

[0596] One way to develop more sophisticated breeding strategies is toempirically compare different breeding strategies. A DNA shufflingmachine allows for increased throughput and accuracy in molecularmatings.

[0597] Standard DNA shuffling is done, e.g., by purifying DNA fragmentson gels, assembling fragments in a PCR machine, rescuing fragments in aPCR machine, and then cloning the final rescued product. The essentialconstraint with this approach is that it requires skilled labor and itis typically costly for a given person to sample a more than a fewshuffling variables. However, there are many variables of interest, suchas pairwise vs. pooled matings, fragment size, stoichiometry of theparental genes, degree of random mutation vs generating diversity byrecombination, etc.

[0598] This example provides a solution to this difficulty by automatingthe shuffling process, providing scalability and other advantages. Theexample DNA shuffling machine which is the subject of this example isembodied in FIGS. 10 (showing a schematic of the DNA shuffling machine),11 (showing a schematic of a DNA fragmentation device), 12 (showing aschematic of a DNA fragment analysis and isolation device), 13 (showinga schematic of a DNA fragment preparation device), 14 (showing aschematic of a precision microamplifier), 15 (showing a schematic of aDNA assembly and rescue module), 16 (showing a schematic of arecombination analysis device), and 17 (showing a schematic of arecombination analysis device).

[0599]FIG. 10 describes an overall DNA shuffling machine (10-1). Thisdevice/system can be built either as an integrated unit, or as aseparate module. It can be designed to handle multiple samples inparallel, as each of the modules is scalable. As shown, Input elementsincluding, e.g., plasmids, PCR products, genomic DNAs, primers, etc. arefragmented in DNA fragmentation device or module 10-2. Also included areDNA assembly and rescue device or module 10-3 providing for outputs,e.g., in the form of recombined/shuffled inserts. Finally, recombinationand analysis module or device 10-4 provides for recombination analysison any recombined/shuffled materials (e.g., shuffled insert DNAs).

[0600]FIG. 11 describes a DNA fragmentation device. For the purpose ofautomation, a reliable, preparation independent method to producefragments of a desired size is useful. Sonication is a useful methodbecause the fragment length depends on purely physical parameters suchas the frequency of sonication and the viscosity of the fluid. However,one issue with this method is the type of ends that are generated, as 3′hydroxyl ends are preferred for subsequent assembly steps to work. Theaddition of chemical cleaving agents can improve the yield of 3′hydroxyls in the sonication reaction. Enzymatic treatment with anuclease that is specific for, for example, 3′ phosphates, improves thequality of sonicated fragments for DNA shuffling reactions. Otherfragmentation methods discussed supra can also be adapted to the presentexample, such as the use of point-sink shearing methods, synthesis, etc.

[0601]FIG. 12 describes a DNA fragment analysis and isolation device. Acapillary electrophoresis instrument (e.g., column 12-1) is used toseparate the DNA fragments. A detector monitors fluorescently labeledmarkers on the column to a “waste” or to “collection” reservoir. Thisallows for automated collection of DNA fragments in the size range thatis programmed by the user. An analytical instrument, made of componentssimilar to those used for sequencing gels, can be used for theanalytical runs for doing analysis of PCR with recombination oligos orfor analysis of raw assemblies to assess the efficiency of assembly. Forexample, one can collect 25-50 bp fragments.

[0602]FIG. 13 describes a DNA fragment prep device. The DNA is denaturedto expose or create single stranded DNA that binds efficiently to a C18hydrophobic column and which can be quantitatively eluted andconcentrated. This uses the principle of the SEP-PAK C18 column, but ismodified for use in an automated device. Alternatives to this approachinclude ion exchange chromatography, precipitation, lyophilization, etc.

[0603]FIG. 14 describes a precision microamplifier (PMA). DNA 14-6 isplaced in microcapillary 14-7 between two drops of oil (14-4, 14-5) toseal it against evaporation. Typical drop sizes range from 1 nl to 1 μl.The micro-capillary is moved through three resistors (14-1, 14-2, 14-3)whose temperatures are programmed. As depicted, robotic arm 14-8 is usedto move the capillary, and thus the DNA droplet, e.g., between resistors14-1, 14-2, and 14-3. In the simplest case, the resistors are set for,e.g., 93, 45 and 72 degrees centigrade. By moving cyclically throughthese temperatures, a PCR or assembly reaction can be driven inmicrodroplet in the microcapillary. A chief advantage of this relativeto a standard PCR machine is that the temperature can be controlled moreprecisely, and, more importantly for DNA shuffling, the volume of theassembly reaction can be driven into the submicroliter range veryeasily. This allows shuffling using small quantities of fragments,allowing for more molecular “crosses” in the shuffling reactions from agive amount of input DNA.

[0604]FIG. 15 describes DNA assembly and rescue module 15-1. Assembly isdone in a modified PCR machine or in the PMA (depicted as assembler15-2). The PMA, or similar low-volume/high throughput methods provideone preferred approach, because one can amplify very small volumes whichprovides for shuffling using a smaller quantity of fragmented DNA. TheAnalyzer provides a quantitative way to monitor the size anddistribution of PCR products and the properties of PCR rescue. A cleanand efficient rescue of a unit length of a gene fragment is preferred.The size distribution of assembled product and the properties of therescue PCR are highly informative for predicting the efficiency ofshuffling that has occurred. The analysis can be done by capillaryelectrophoresis or by mass spec. As depicted, various inputs, includingrandom DNA fragments, overlapping PCR fragments and the like areassembled in assembler 15-2. The assembly and rescue module furtherincludes rescue PCR element 15-3 and analyzer 15-4 (e.g., including acapillary electrophoresis module). Assembly module 15-1 produces outputsincluding assembled fragments, rescued PCR inserts and the like.Analyzer 15-4 provides profile information including size distributioninformation.

[0605]FIG. 16 describes recombination analysis device/module 16-1.Inputs include raw assembled components and PCR rescued assembledcomponents. Outputs include analysis of the ratio of recombined toparental sequences. In the device, “Crossover oligos” prime one oranother parents exclusively, and thus, a 5′ oligo from P1 and a 3′ oligofrom P2 only PCR amplify a recombinant such as F1(B). The analyzer is,for example, a capillary electrophoresis machine that precisely measuresthe size and intensity of each band. By using multiple fluorophores inthe crossover oligos, one can measure, e.g., all four PCR products ofthe amplification in a single lane, if desired. In the figure, P1=parent#1; P2=parent #2; F1(A) and F1(B) are recombinants with structures withrespect to the crossover oligos as shown. The crossover oligos are setsof oligos that exclusively (or at least preferentially) prime theindicated parents. The strategy can be generalized to accommodatemultiple pairs of crossover oligos. An advantage of the recombinationanalysis device is that it allows one to quantitatively monitor theshuffling reaction. For example, if 100-200 base fragments are used inthe shuffling, then crossover oligos that are 300 bp apart in theassembled genes are almost fully recombined (recombinants F1(A) andF1(B) bands of only half the intensity of the parental bands.

[0606] The DNA fragmentation device and the DNA Fragment Prep Devicetake the tedium out of preparing gene fragments. They can also increasethe yield of fragments of the desired size. The assembly and rescuedevice allows one to test multiple assembly conditions; e.g., if theprecision microamplilier is used for the assembly. The analysisinstrument allows one to quantitatively monitor the growth of theshuffled product. This analysis capability is useful for troubleshooting, which ultimately makes the process even more predictablyautomatable.

[0607] The recombination/analysis device allows one to quantitativelymeasure the frequency of recombination between any known DNApolymorphisms in the parental genes. This analysis is useful in theoptimization of shuffling reactions generally. It is similar in effectto measuring recombination frequencies in populations. Importantly, itallows one to make an educated decision as to whether a given shufflingreaction is worth cloning, or in vitro expressing and screening infunctional assays, as opposed to doing further work to optimize theshuffling reaction to get a desired spectrum of recombinants. This is ofparticular value when the number of clones that can be screened islimited or costly.

[0608] J. Example: establishment and automated processing of expressionarrays for nucleic acids derived from a variety of sources.

[0609] Identification and characterization of genes from macro- andmicro-organisms, enrichment cultures, fermentation broths anduncharacterized environmental isolates, and the like is of commercialvalue. These genes can be used as substrates in the various diversitygeneration reactions herein. Various approaches for using diversesources of materials in the systems of the present invention areschematically outlined in FIGS. 23-30.

[0610] In the process embodiment of FIG. 23, nucleic acids are sourcedfrom any of a variety of diverse sources, including any of those listedin the figure (humans and other vertebrates, other eukaryotes,oligonucleotides and gene synthesis, etc.) The nucleic acids areextracted and/or pooled. Optionally, the pooled nucleic acids arecloned, selected, hybridized, sized, etc. The nucleic acids are thenarrayed. The arrayed nucleic acids are then optionally cloned, selected,hybridized, amplified, etc. The arrays are replicated, transcribedand/or translated. The genes can be encapsulated if desired. Proteins orbioactive RNAs are screened for activities of interest. Finally, aphysical or logical linkage between the array members and the relevantobserved phenotypes is established.

[0611] In the process embodiment of FIG. 24, nucleic acids are sourcedfrom any available source, including one or more of those listed in thefigure, and extracted/pooled. Nucleic acids are treated with one or moreenzyme, ligated into one or more vectors and introduced into cells.Cells are propagated in the cells. Optionally, the cells or expressednucleic acids can initially be arrayed. Clones of interest are selectedusing a plurality of screens, such as hybridization, complementation,etc. The selected nucleic acids are arrayed and the arrays replicated.One or more of the replicated arrays is transcribed and/or translated.Optionally, other arrays or array members can be cloned, selected,hybridized, etc. Bioactive RNAs or proteins are selected for one or moreactivity and, again, a physical or logical linkage between the arraymembers and the relevant observed phenotypes is established.

[0612] In the process embodiment of FIG. 25, the sourced nucleic acids(again, from any of a variety of diverse sources, including any of thoselisted in the figure) are extracted and/or pooled, hybridized with atleast one synthetic or naturally occurring nucleic acid or populationfrom another source, and treated with at least one enzyme including atleast one polymerase or ligase activity. Nucleic acids are arrayed andarrays replicated. Optionally, the arrays or array members include anyof a variety of additional operations, including cloning, selection,hybridization, etc. Bioactive RNAs or proteins are selected for one ormore activity and, again, a physical or logical linkage between thearray members and the relevant observed phenotypes is established.

[0613] In the process embodiment of FIG. 26, sourced nucleic acids (alsofrom any of a variety of diverse sources, including any of those listedin the figure) are extracted and/or pooled. The resulting nucleic acidsare hybridized with at least one synthetic or naturally occurringnucleic acid or population from another source. The resultinghybridization mixture is treated with at least on enzyme containing atleast one polymerase and/or ligase activity. The resulting nucleic acidsare ligated into a vector, introduced into cells and propagated.Optionally an initial array of the resulting library is performed atthis stage of the overall process. Library members (clones) are selectedusing one or more screens. The selected members are arrayed and thearrays replicated. Bioactive RNAs or proteins are selected for one ormore activity and, again, a physical or logical linkage between thearray members and the relevant observed phenotypes is established.

[0614] In the process embodiment of FIG. 27, nucleic acids are sourcedfrom any of a variety of diverse sources, including any of those listedin the figure (humans and other vertebrates, other eukaryotes,oligonucleotides and gene synthesis, etc.) The nucleic acids areextracted and/or pooled. Optionally, the pooled nucleic acids arecloned, selected, hybridized, sized, etc. The nucleic acids are thenarrayed. The arrayed nucleic acids are then optionally cloned, selected,hybridized, amplified, etc. The arrays are replicated, transcribedand/or translated. The genes can be encapsulated if desired. Proteins orbioactive RNAs are screened for activities of interest. In thisembodiment, the properties which are screened include fluorescent orluminescent properties of a particle such as a cell, encapsulatedmixture or other matrix, liposome or membrane encapsulated materialwhich incorporates a viral coat protein, or other encapsulated material.The cell or other encapsulated material is used to decide the endlocations of such particles on an array comprising at least twodesignated end locations or chambers. Detection is via FACS, microFACS(with or without a fluorescent signal), fluorescence, visible scanning,transmission or confocal microscopy, digital or high-density signalimaging, thermography, liquid chromatography, combinations thereof, orthe like. A physical br logical linkage between the array members andthe relevant observed phenotypes is then established.

[0615] In the process embodiment of FIG. 28, nucleic acids are sourcedfrom any of a variety of diverse sources, including any of those listedin the figure (humans and other vertebrates, other eukaryotes,oligonucleotides and gene synthesis, etc.) The nucleic acids areextracted and/or pooled. Optionally, the pooled nucleic acids arecloned, selected, hybridized, sized, etc. The nucleic acids are thenarrayed. The arrayed nucleic acids are then optionally cloned, selected,hybridized, amplified, etc. The arrays are replicated, transcribedand/or translated. The genes can be encapsulated if desired. Proteins orbioactive RNAs are screened for activities of interest. In thisembodiment, the screening comprises combination screening of theproteins or bioactive RNAs. Properties which are screened includefluorescent or luminescent properties of a particle such as a cell,encapsulated mixture or other matrix, liposome or membrane encapsulatedmaterial which incorporates a viral coat protein, or other encapsulatedmaterial. The cell or other encapsulated material is used to decide theend locations of such particles on an array, e.g., comprising at leasttwo designated end locations or chambers. Detection is via FACS,microFACS (with or without a fluorescent signal), fluorescence, visiblescanning, transmission or confocal microscopy, digital or high-densitysignal imaging, thermography, liquid chromatography, combinationsthereof, or the like. In addition, the array, e.g., at at least one ofthe end locations, comprises a population of target cells in which agiven biological activity is directly assessed, such as cytocidal orantibiotic activities, stimulation or suppression of growth, generationof a detectable signal, or the like. A physical or logical linkagebetween the array members and the relevant observed phenotypes is thenestablished.

[0616] In the process embodiment of FIG. 29, nucleic acids are sourcedfrom any of a variety of diverse sources, including any of those listedin the figure (humans and other vertebrates, other eukaryotes,oligonucleotides and gene synthesis, etc.). The nucleic acids areextracted and/or pooled. Optionally, the pooled nucleic acids arecloned, selected, hybridized, sized, etc. The nucleic acids are thenarrayed. The arrayed nucleic acids are then optionally cloned, selected,hybridized, amplified, etc. The arrays are replicated, transcribedand/or translated. The array members are also encapsulated in thisembodiment. Proteins or bioactive RNAs are screened for activities ofinterest. In this embodiment, the properties which are screened caninclude fluorescent or luminescent properties of a particle,encapsulated mixture, liposome, or mixture encased in a membranecomprising one or more viral coat proteins which are used to decide,e.g., end locations of such particles on an array, e.g., comprising atleast two designated end locations or chambers. Such methods include anycombination of FACS or microFACS (with of without a fluorescent signal);fluorescent, visible, scanning, transmission and confocal microscopy;digital or high density digital imaging, thermography, liquidchromatography, and the like. A physical or logical linkage between thearray members and the relevant observed phenotypes is then established.

[0617] In the process embodiment of FIG. 30, nucleic acids are sourcedfrom any of a variety of diverse sources, including any of those listedin the figure (humans and other vertebrates, other eukaryotes,oligonucleotides and gene synthesis, etc.). The nucleic acids areextracted and/or pooled. Optionally, the pooled nucleic acids arecloned, selected, hybridized, sized, etc. The nucleic acids are thenarrayed. The arrayed nucleic acids are then optionally cloned, selected,hybridized, amplified, etc. The arrays are replicated, transcribedand/or translated. The genes can be encapsulated if desired. Proteins orbioactive RNAs are screened for activities of interest. In thisembodiment, the screening comprises combination screening of theproteins or bioactive RNAs. Properties which are screened includefluorescent or luminescent properties of a particle such as a cell,encapsulated mixture or other matrix, liposome or membrane encapsulatedmaterial which incorporates a viral coat protein, or other encapsulatedmaterial. The cell or other encapsulated material is used to decide theend locations of such particles on an array, e.g., comprising at leasttwo designated end locations or chambers. Detection is via FACS,microFACS (with or without a fluorescent signal), fluorescence, visiblescanning, transmission or confocal microscopy, digital or high-densitysignal imaging, thermography, liquid chromatography, combinationsthereof, or the like. In addition, the array, e.g., at at least one ofthe end locations, comprises a population of target cells in which agiven biological activity is directly assessed, such as cytocidal orantibiotic activities, stimulation or suppression of growth, generationof a detectable signal, or the like. A physical or logical linkagebetween the array members and the relevant observed phenotypes is thenestablished.

[0618] The field of gene isolation is well developed, e.g., in theexpression array (e.g., Gene chip™, Aflymetrix, Santa Clara, Calif.) andeukaryotic genomics areas, in which, e.g., RNA or genomic DNA is used todetect or sequence novel open reading frames. While tools for sequencingcomplex genomes of higher organisms has advanced rapidly, less work hasbeen done on sequencing, deconvoluting or otherwise characterizing thegenetic properties of microorganisms and microbial systems. Furthermore,while the generation and use of hybridization and sequencing arrays hasundergone significant advancement, much of the advances are based on theability to identify and purify the messenger RNA or intact high MWgenomic DNA from higher organisms.

[0619] For eukaryotic mRNA, the presence of poly-adenylated tail allowsrapid creation and use of convenient EST (expressed sequence tagged)libraries. Since lower organism rarely exhibit such tails, other toolsare used for rapid cloning, characterization and analysis.

[0620] Recently, methods for extracting nucleic acids at high yield frommicrobial cultures, broths, pathogen and environmental samples have beendescribed. Where complex, soil-containing or mixed culture systems aretargeted for characterization or gene mining, these methods generallyuse any of a variety of treatments to provide high yield, high puritynucleic acids. For example, a variety of publications and patentsdescribing such methods are listed herein. Examples include Short“PRODUCTION OF ENZYMES HAVING DESIRED ACTIVITIES BY MUTAGENESIS” U.S.Pat. No. 5,939,250 (See alsohttp://www.accessexcellence.com/AB/IWT/1297xtremo.html andhttp://www.diversa.com/techplat/techover.asp), Thompson ,et al. (1998)“METHODS FOR GENERATING AND SCREENING NOVEL METABOLIC PATHWAYS” U.S.Pat. Nos. 5,824,485 and 5,783,431; and Carlson, et al. (1999) “METHOD OFRECOVERING A BIOLOGICAL MOLECULE FROM A RECOMBINANT MICROORGANISM” U.S.Pat. Nos. 5,908,765, 5,837,470 and 5,773,221, which allege variousmethods for creating libraries from, e.g., uncharacterized heterogeneousmicrobial samples. The present invention provides, e.g., for automation,spatial or logical arrays and associated tools in mediating, improvingor replacing these processes.

[0621] Often, effective development of a commercially relevant enzyme,protein or biochemical pathway (e.g. for pharmaceutical or industrialapplications) involves identifying a plurality of favorable activityparameters be encoded by the candidate gene(s). Having a means ofrapidly recruiting and then diversifying a wide variety of startinggenes from a wide variety of sources—such as may share a commonstructural or activity motif—is of importance for rapid gene or pathwaydevelopment. The present application teaches the application of a familyof array operations and automated processing of a wide range ofmutagenesis, gene synthesis and recombination and technologies forimproving candidate genes.

[0622] While preliminary gene recruitment can be done by hybridizationor on the basis of logically derived and/or stored hybridizationinformation, hybridization is often not used in confirming the activityor intactness of a given nucleic acid within a physical array. For morerefined recruitment or identification of promising candidate geneswithin an array, it is useful to have at least one other biochemicalactivity measurement on which to contrast the various members of thestorage array. The current invention contemplates and describes a largenumber of logical and laboratory-based criteria and processes forstoring, maintaining and recording that information and its physical oflogical linkage with given members of the array. Thus a member of anarray is most accurately defined on the basis of its activity in each ofthe tests performed on it.

[0623] A wide variety of phenotypic attributes or combinations of suchattributes are useful for identifying genes for suitable for a givenapplication, process, pathway or subsequent evolution toward suchapplications. In addition to simply creating libraries from diversesamples, expressing such libraries in cells or on phage, and analyzingthe results biochemically, the present invention provides, e.g., forautomated, integrated or integrateable modules for rapidly producing andcharacterizing expression arrays, e.g., by way of in vitro transcriptionor translation tools. The present methods also describe the utility anddesign of automated processes for identifying, cataloging, selecting andsubsequently evolving genes from natural or synthetic systems.

[0624] One embodiment the present invention describes an automatedprocess for recruiting genes from natural, synthetic or logical sourcesand storing genetic material suitable for subsequent characterization,mutagenesis, selection and evolution. In another embodiment, itdescribes the automated devices or modules which carry out suchprocesses.

[0625] In addition, the present invention describes a series of general,automatable methods for high-yield extraction of nucleic acids from awide variety of samples. In these methods, samples containing nucleicacids (e.g., as from diverse or clonal cultured or uncultured cellularpopulations; tissue sections; sera samples; samples from heterogeneousenrichment cultures, bioreactors or fermentors; samples containing oneor more uncharacterized microorganism; environmental isolates; soil,water or microcosm samples) are treated by a method, e.g., comprisingthe following processes.

[0626] First the sample is treated with a plurality of chemical lysingagents (consisting of: chaotropic substance(s), detergent(s),chelator(s), proteinase(s), exo- or endo-glucanase(s), lysozyme(s) andother proteoglycan or cell wall degrading enzymes, etc.) underconditions which allow the lysing agent or agents to come into liquidcontact the cell membranes the target cells. The plurailty of lysingagents can include a chaotropic agent capable of substantiallyinactivating a wide variety of nucleases. Similarly, the plurality oflysing agents can include at least one chaotrope and at least one enzymefor lysis. Examples of lysing agents include urea, guanidine andguanidinium, enzymes, etc. Any one or more of these chemical or physicallysing conditions can be used on a given sample, or a sample may besubdivided and subjected to sequential or combinatorial lysis to: a)identify optimal lysing conditions, b) prepare multiple unique extractsfrom a single sample and/or c) conduct parallel sample preparation, forany purpose.

[0627] Second, the samples can be treated with at least one disruptivephysical condition(s) or treatment(s) (e.g. freeze-thawing, freezedrying, cold-hot cycling, disruptive (rapid) mixing, sonicating,heating, incubation at pH<5.5 or >8.5, etc.). The at least onedisruptive physical condition or treatment can include incubation at atemperature above 37° C. and, e.g. at a temperature of >50° C. The atleast one disruptive physical condition or treatment can include atleast one freeze-thaw, mixing or sonication step and incubation at atemperature of >50° C. The at least one disruptive physical condition ortreatment can include at least one heating or cooling step and at leastone step which can cause (such as mixing, vortexing, sonicating orincubating in hypotonic media) physical shearing of cell walls and highmolecular weight DNA.

[0628] The sample can be subjected to at least one physical-chemicalseparation step (which may be chosen or achieve similar results such asprecipitation, solvent extraction, electrophoretic or chromatographicseparation or others) to isolate high purity nucleic acids, e.g., fromenriched cultures, natural isolates, cultured cells, tissues or sera.For example at least one alcohol mediated precipitation step or oneextraction step can be used. The use of a plurality of physico-chemicalseparation modes can be used in the extraction process. At least oneextraction step and one precipitation or chromatographic step can beused in combination.

[0629] In a preferred embodiment, the process described here isconducted under conditions in which a plurality of lysing agents anddisruptive physical agents are used on and in which the operation isintegrated into an automated device.

[0630] The automation of such a method provides a free-standing anduniquely valuable platform from which to conduct high throughput nucleicacid extraction and purification from diverse sample sources. Nucleicacids prepared in such a way can be further characterized or selected,with or without prior cloning, by hybridization-based detection, capture(e.g. ‘panning’) or direct recombination with other members of thepopulation or exogenous nucleic acids added to the mixture, followed byexpression screening.

[0631] Expression screening can involve at least one in vitrotranscription or translation step. For example, it can involve in vitrotranscription preceded by at least one amplification, polymerization orligation event in which at least one transcriptional regulatory elementis operatively linked to the nucleic acids to undergo transcription. Ina presently preferred embodiment, the method involves the in vitrotranslation of library members using transcripts derived from either invitro, synthetic or cellular sources.

[0632] The present invention describes, e.g., the following automatedmodules for the isolation, detection and evolution of nucleic acids fromnatural and synthetic isolates: nucleic acid isolation modules, nucleicacid generation modules, nucleic acid sorting or selection modules,dilution modules, array replication modules, expression module,screening modules, etc. Such modules can operate as free-standingdevices or as sub-elements of a larger device or other system whichlinks one or more of these modules physically or logically to create,modify, analyze replicate or otherwise manipulate members of interestwithin the array.

[0633] The present invention also provides a logical association fororganizing a multiple-phenotype screening array. For example, thepresent invention provides for detection and screening of genes in aprimarily binary process, where individual clones, proteins or enzymes(whether protein or nucleic acid, or both) are identified as eitherhaving or not having a specified property or set of properties(resulting in a binary “yes/no” logical operation by the system inevaluating the properties). In addition to strictly binary processes,degrees of activity can also be detected and manipulated by the system.

[0634] The invention can also include the organization ofmulti-phenotype screening in which (one or more) clones in the array aredescribed, organized, screened or otherwise sorted (in physical orcomputational terms) by their activity fingerprints, such thatcharacterization of the array is open-ended and allows for increasinglydiverse layers of characterization to be applied. Such arrays can remainclosed-ended with respect to their origin or member nucleic acids. Inone embodiment, the array architecture allows for each clone, pool ofclones, individual or individual pools of nucleic acids within an arrayto be described in both (or either) binary and quantitative terms withrespect to a given activity or property and provides a means for furtherisolation, processing or characterization of those members selected onthe basis of either Boolean or quantitative queries, or combinations ofthe two.

[0635] While not limited to these, the query-able properties includebiological or chemical activities, physical or structural attributes,nucleic acid or amino acid sequences, source, prior processing methods,histories or exposures or physical state within the array. In anotherembodiment, the present invention provides for the automated orsemi-automated amplification, replication and in vitro transcriptionand/or translation of the physical array to create sub-arrays which canbe stored or screened for other properties. In preferred embodiments,the present invention describes a process and a device for isolatingnucleic acids from natural or synthetic or computational sources,storing such nucleic acids as logical (or physical) arrays based on aplurality of phenotypes (one of which may be its nucleotide sequence)and the contacting of arrays, with one or more in vitro transcription ortranslation reagents.

[0636] In the present invention, the term ‘phenotype’ is used to referto a general or specific set of traits for which a given clone has beenscreened. The complete complement of phenotypic traits may be deriveddirectly from laboratory data, by logical inference from such data orfrom stored databases of relevant data (e.g. such as activity, sequenceor relational databases). These traits can be directly or indirectlyscreened, including for stability under natural non-natural physical orchemical conditions, expressibility in a given cell line, strain or invitro extract, size, solubility, hybridization properties, sequence,associated regulatory elements, catalytic rate, substrate or productselectivity, luminescent, fluorescent, light scattering, x-raydiffracting, sedimentation, binding, calorimetric, refractive or otherdiverse properties.

[0637] The arrays of the invention have value in all areas in which geneproducts have utility, including pharmaceutical and chemical discoveryand manufacturing, agriculture, diagnostics, biofuels, fuel cells andbioelectronics, and many other areas. Such arrays are developed, e.g.,from gene libraries extracted from nature or natural sources. They canalso-be derived computationally or via automated gene or oligonucleotidesynthesis. In addition, analogous or derivative arrays may be generatedvia the application of shuffling or other mutagenesis methods to one ormore parental nucleic acids.

[0638] While each phenotypic attribute is of value in describing a givenmember of an array, certain combinations of properties can beparticularly useful in characterizing genes for utility inpharmaceutical or chemical manufacturing processes. For example, anarray in which at least one physical attribute and at least oneselectivity attribute are measured for a plurality of members of thatarray can be more valuable than one in which only the expression,selectivity or stability attribute has been assessed.

[0639] Similarly, an array containing enzymes (or cells expressing suchenzymes) which have been quantitatively characterized for their testedfor their ability to stereoselectively convert a substrate to a givenproduct under a defined solvent or temperature regime is moreinformative to the synthetic or process chemist interested in the givenconversion than one in which only one of the properties listed has beenexamined. For synthetic and process chemistry applications physicalchemical attributes of interest include many diverse attributes. Forexample, stability or activity in solvents or mixed water-solventssystems (common solvents would, for example include polar protic andaprotic solvents, nonpolar solvents, alcohols, ethers, esters, alkanes,halogenated solvents, phenols, tetrahydrofuran, benzene and itsderivatives, aromatic, fluorinated and perfluorinated solvents, etc . .. ), stability or activity at elevated or depressed temperatures (e.g.above 50° C. and below 20° C.; e.g., >70° C. or <10° C.), and stabilityor activity in high or low salt concentrations (e.g. >1 M or <0.050 Msodium, potassium and ammonium containing salts with chloride, bromide,nitrate, nitrite, sulfate, sulfite, carbonate, bicarbonate or amino acidcounterions). Similarly, stability or activity at high or low pressure,in oxygen-rich or oxygen deficient environments and/or in the presenceof a variety of a one or more agents capable of inactivating proteins bycovalent modification (e.g., acylating, alkylating and amide reactiveagents), stability of activity in the presence of at least phasetransfer or crosslinking agent, or stability or activity within or upona solid matrix (e.g. by covalent or noncovalent association with anatural or functionalized surface, the surface comprising a hydrophobicor hygroscopic polymer, silica, glass, metal, aluminum, alloy,cellulosic or modified cellulosic, hygroscopic insoluble material anatural biopolymer, a polysaccharide and modified forms of these) canalso be of interest.

[0640] Selectivity attributes of interest in process and combinatorialchemistry include, but are not limited to, product or substratechemoselectivity, regioselectivity, stereoselectivity andenantioselectivity; and each of these in combination with a plurality ofsolvent and physical conditions such as those described above. Thus thepresent invention describes means of making and using logical and/orphysical enzyme arrays in which each member has been characterized onthe basis of its activity under at least one nonphysiological physicalcondition and at least one selectivity attribute. For example, the atleast one nonphysiological condition can involve one or more of thefollowing conditions: a nonphysiological thermal, salt, solvent,pressure, or oxygen condition; the presence of active levels of one ormore crosslinking agents; or the presence of active levels of one ormore potential covalent modifying agents; or immobilization upon on anonbiological surface.

[0641] K. Further Embodiments

[0642] In a further aspect, the present invention provides for the useof any apparatus, apparatus component, composition or kit herein, forthe practice of any method or assay herein, and/or for the use of anyapparatus or kit to practice any assay or method herein.

[0643] While the foregoing invention has been described in some detailfor purposes of clarity and understanding, it will be clear to oneskilled in the art from a reading of this disclosure that variouschanges in form and detail can be made without departing from the truescope of the invention. For example, all the techniques, methods,compositions, apparatus and systems described above may be used invarious combinations. All publications, patents, patent documents(including patent applications) and other references cited in thisapplication are incorporated by reference in their entirety for allpurposes to tile same extent as if each individual publication, patent,patent document or other reference were individually indicated to beincorporated by reference for all purposes.

What is claimed is:
 1. A device or integrated system, comprising: aphysical or logical array of reaction mixtures, each reaction mixturecomprising one or more shuffled or mutagenized nucleic acids or one ormore transcribed shuffled or transcribed mutagenized nucleic acids andone or more in vitro translation reagents.
 2. The device or integratedsystem of claim 1, further comprising a duplicate of the physical orlogical array.
 3. The device or integrated system of claim 1, furthercomprising a bar-code based sample tracking module, which modulecomprises a bar code reader and a computer readable database comprisingat least one entry for at least one array or at least one array member,which entry is corresponded to at least one bar code.
 4. The device orintegrated system of claim 1, further a long term storage devicecomprising of one or more of: a refrigerator; an electrically poweredcooling device; a device capable of maintaining a temperature of <0 C.;a freezer; a device which uses liquid nitrogen or liquid helium forcooling storing or freezing samples, a container comprising wet or dryice, a constant temperature and/or constant humidity chamber orincubator; or an automated sample storage or retrieval unit.
 5. Thedevice or integrated system of claim 4, further comprising one or moremodules for moving arrays or array members into the long term storagedevice.
 6. The device or integrated system of claim 1, furthercomprising a copy array comprising a copy of each of a plurality ofmembers of the one or more shuffled or mutagenized nucleic acids in aphysically or logically accessible arrangement of the members.
 7. Thedevice or integrated system of claim 1, wherein a plurality of thereaction mixtures further comprise one or more translation products orone or more transcription products, or both one or more translationproducts and one or more transcription products.
 8. The device orintegrated system of claim 1, wherein the array of reaction mixturescomprises a solid phase, liquid phase or mixed phase array of one ormore of: the one or more shuffled nucleic acids, the one or moretranscribed shuffled nucleic acids, or the one or more in vitrotranslation reagents.
 9. The device or integrated system of claim 1,wherein the one or more shuffled nucleic acids are homologous.
 10. Thedevice or integrated system of claim 1, wherein the one or moretranscribed shuffled nucleic acid is an mRNA.
 11. The device orintegrated system of claim 1, wherein the one or more in vitrotranslation reagents comprise one or more of: a reticulocyte lysate, arabbit reticulocyte lysate, a canine microsome translation mixture, awheat germ in vitro translation (IVT) mixture, or an E coli lysate. 12.The device or integrated system of claim 1, further comprising one ormore in vitro transcription reagents.
 13. The device or system of claim12, wherein the in vitro transcription reagents comprises one or moreof: an E. coli lysate, an E. coli extract, an E coli s20 extract, acanine microsome system, a HeLa nuclear extract in vitro transcriptioncomponent, an SP6 polymerase, a T3 polymerase or a T7 RNA polymerase 14.The device or integrated system of claim 1, further comprising a nucleicacid shuffling or mutagenesis module, which nucleic acid shuffling ormutagenesis module accepts input nucleic acids or character stringscorresponding to input nucleic acids and manipulates the input nucleicacids or the character strings corresponding to input nucleic acids toproduce output nucleic acids, which output nucleic acids comprise theone or more shuffled or mutagenized nucleic acids in the reactionmixture array.
 15. The device or integrated system of claim 14, whereinthe output nucleic acids comprise one or more sequence which controlstranscription or translation.
 16. The device or integrated system ofclaim 14, wherein the nucleic acid shuffling or mutagenesis modulecomprises a DNA shuffling module, which DNA module accepts input DNAs orcharacter strings corresponding to input DNAs and manipulates the inputDNAs or the character strings corresponding to input DNAs to produceoutput DNAs, which output DNAs comprise the one or more shuffled DNAs inthe reaction mixture array.
 17. The device or integrated system of claim1, wherein the nucleic acid shuffling or mutagenesis module is precededby a module which allows overlapping synthetic oligonucleotides to befirst assembled into oligonucleotide multimers or functional openreading frames prior to entering the mutagenesis or shuffling module.18. The device or integrated system of claim 14, wherein one or moremodule comprises or is operatively linked to a thermocycling device. 19.The device or integrated system of claim 14, wherein the nucleic acidshuffling or mutagenesis module comprises a mutagenesis module, whichmutagenesis module mutagenizes the DNA.
 20. The device or integratedsystem of claim 14, wherein the nucleic acid shuffling or mutagenesismodule fragments the input nucleic acids to produce nucleic acidfragments, or wherein the input nucleic acids comprises cleaved orsynthetic nucleic acid fragments.
 21. The device or integrated system ofclaim 14, wherein the shuffling or mutagenesis module is mechanically,electronically, robotically or fluidically coupled to at least one otherarray operation module.
 22. The device or integrated system of claim 14,wherein, the nucleic acid shuffling or mutagenesis module performs oneor more of: StEP PCR, uracil incorporation or chain termination.
 23. Thedevice or integrated system of claim 14, or 20, wherein the nucleic acidshuffling module comprises an identification portion, whichidentification portion identifies one or more nucleic acid portion orsubportion.
 24. The device or integrated system of claim 14 or 20,wherein the nucleic acid shuffling module comprises a fragment lengthpurification portion, which fragment length purification portionpurifies selected length fragments of the nucleic acid fragments. 25.The device or integrated system of claim 20, wherein the nucleic acidshuffling module permits hybridization of the nucleic acid fragments andwherein the nucleic acid shuffling module comprises a polymerase whichelongates the hybridized nucleic acid.
 26. The device or integratedsystem of claim 25, wherein the nucleic acid shuffling module combinesone or more translation or transcription control sequence into theresulting elongated nucleic acid.
 27. The device or integrated system ofclaim 26, wherein the one or more translation or transcription controlsequence is combined into the resulting elongated nucleic acid using thepolymerase, or a ligase, or both the polymerase and the ligase.
 28. Thedevice or integrated system of claim 25, wherein the nucleic acidshuffling module separates, identifies, purifies or immobilizes theresulting elongated nucleic acid.
 29. The device or integrated system ofclaim 25, wherein the nucleic acid shuffling module determines arecombination frequency or a length, or both a recombination frequencyand a length, for the resulting elongated nucleic acids.
 30. The deviceor integrated system of claim 25, wherein the nucleic acid shufflingmodule determines nucleic acid length by detecting incorporation of oneor more labeled nucleic acid or nucleotide into the resulting elongatednucleic acid.
 31. The device or integrated system of claim 25, whereinthe nucleic acid shuffling module determines nucleic acid length bydetecting one or more label associated with the resulting elongatednucleic acid.
 32. The device or integrated system of claim 30, whereinthe label is a dye, radioactive label, biotin digoxin, or a fluorophore.33. The device or integrated system of claim 25, wherein the nucleicacid shuffling module determines nucleic acid length with a fluorogenic5′ nuclease assay.
 34. The device or integrated system of claim 1,wherein the physical or logical array of reaction mixtures isincorporated into a microscale device, or wherein at least one of thereaction mixtures is incorporated into a microscale device, or whereinthe one or more shuffled or mutagenized nucleic acids or the one or moretranscribed shuffled or mutagenized nucleic acids is found within amicroscale device, or wherein the one or more in vitro translationreagents is found within a microscale device.
 35. The device orintegrated system of claim 25, wherein the nucleic acid shuffling modulecomprises one or more microscale channel through which a shufflingreagent or product is flowed.
 36. The device or integrated system ofclaim 35, wherein the channel is integrated in a chip.
 37. The device orintegrated system of claim 35, wherein liquid flow through the device ismediated by capillary flow, differential pressure between one or moreinlets and outlets, electroosmosis, hydraulic or mechanical pressure, orperistalsis.
 38. The device or integrated system of claim 25, whereinthe nucleic acid fragments are contacted in a single pool.
 39. Thedevice or integrated system of claim 25, wherein the nucleic acidfragments are contacted in multiple pools.
 40. The device or integratedsystem of claim 25, wherein the nucleic acid shuffling module dispensesthe resulting elongated nucleic acids into one or more multiwell plates,or onto one or more solid substrates, or into one or more microscalesystems, or into one or more containers.
 41. The device or integratedsystem of claim 25, wherein the nucleic acid shuffling modulepre-dilutes the resulting elongated nucleic acids and dispenses theminto one or more multiwell plates.
 42. The device or integrated systemof claim 25, wherein the nucleic acid shuffling module dispenses theresulting elongated nucleic acids into one or more multiwell plates at aselected density per well of the elongated nucleic acids.
 43. The deviceor integrated system of claim 25, wherein the nucleic acid shufflingmodule dispenses the resulting elongated nucleic acids into one or moremaster multiwell plates and PCR amplifies the resulting master array ofelongated nucleic acids to produce an amplified array of elongatednucleic acids, the shuffling module further comprising a array copysystem which transfers aliquots from the wells of the one or more mastermultiwell plates to one or more copy multiwell plates.
 44. The device orintegrated system of claim 43, wherein an extent of PCR amplification isdetermined by one or more technique selected from: an incorporation of alabel into one or more amplified elongated nucleic acid, and afluorogenic 5′ nuclease assay.
 45. The device or integrated system ofclaim 43, wherein the array of reaction mixtures is formed by separateor simultaneous addition of an in vitro transcription reagent and an invitro translation reagent to the one or more copy multiwell plates, orto a duplicate set thereof, wherein the elongated nucleic acids comprisethe one or more shuffled nucleic acids.
 46. The device or integratedsystem of claim 1, further comprising one or more sources of one or morenucleic acids the one or more sources collectively or individuallycomprising a first population of nucleic acids, wherein the shufflednucleic acids are produced by recombining the one or more members of thefirst population of nucleic acids.
 47. The device or integrated systemof claim 46, the one or more sources of nucleic acids comprising atleast one nucleic acid selected from: a synthetic nucleic acid, a DNA,an RNA, a DNA analogue, an RNA analogue, a genomic DNA, a cDNA, an mRNA,a DNA generated by reverse transcription, an nRNA, an aptamer, apolysome associated nucleic acid, a cloned nucleic acid, a cloned DNA, acloned RNA, a plasmid DNA, a phagemid DNA, a viral DNA, a viral RNA, aYAC DNA, a cosmid DNA, a fosmid DNA, a BAC DNA, a P1-mid, a phage DNA, asingle-stranded DNA, a double-stranded DNA, a branched DNA, a catalyticnucleic acid, an antisense nucleic acid, an in vitro amplified nucleicacid, a PCR amplified nucleic acid, an LCR amplified nucleic acid, aQβ-replicase amplified nucleic acid, an oligonucleotide, a nucleic acidfragment, a restriction fragment and a combination thereof.
 48. Thedevice or integrated system of claim 46, further comprising a populationdestination region, wherein, during operation of the device, one or moremembers of the first population are moved from the one or more sourcesof the one or more nucleic acids to the one or more destination regions.49. The device or integrated system of claim 48, further comprisingnucleic acid movement means for moving the one or more members from theone or more sources of the one or more nucleic acids to the one or moredestination regions.
 50. The device or integrated system of claim 46,48, or 49 further comprising a source of an in vitro transcriptionreagent or an in vitro translation reagent, wherein, during operation ofthe device, the in vitro transcription reagent or an in vitrotranslation reagent is flowed into contact with the members of the firstpopulation.
 51. The device of claim 50, wherein the members of the firstpopulation are fixed at the one or more sources of one or more nucleicacids or at the one or more destination regions.
 52. The device orintegrated system of claim 49, wherein the nucleic acid movement meanscomprises one or more movement means selected from: a fluid pressuremodulator, an electrokinetic fluid force modulator, a thermokineticmodulator, a capillary flow mechanism, a centrifugal force modulator, arobotic armature, a pipettor, a conveyor mechanism, a peristaltic pumpor mechanism, a magnetic field generator, an electric field generator,and one or more fluid flow path.
 53. The device or integrated system ofclaim 48, the one or more sources of nucleic acids, or the one or morepopulation destination regions comprising one or more member selectedfrom: a solid phase array, a liquid phase array, a container, amicrotiter tray, a microtiter tray well, a microfluidic component, amicrofluidic chip, a test tube, a centrifugal rotor, a microscope slide,an organism, a cell, a tissue, a liposome, a detergent particle, and acombination thereof.
 54. The device or integrated system of claim 45,wherein, during operation of the device, the first population of nucleicacids is arranged into one or more physical or logical recombinantnucleic acid arrays.
 55. The device or integrated system of claim 54,further comprising a duplicate of at least one of the one or morephysical or logical recombinant nucleic acid arrays.
 56. The device orintegrated system of claim 45 or 54, further comprising one or morerecombination modules which move one or more members of the firstpopulation of nucleic acids into contact with one another, therebyfacilitating recombination of the first population of nucleic acids. 57.The device or integrated system of claim 1, further comprising one ormore reaction mixture arraying modules, which arraying modules move oneor more of the one or more shuffled nucleic acids or the one or moretranscribed shuffled nucleic acids or the in vitro translation reactantcomponents into one or more spatial positions, thereby placing the oneor more shuffled nucleic acids or the one or more transcribed shufflednucleic acids or the in vitro translation reactant component intolocations in the array of reaction mixtures.
 58. The device ofintegrated system of claim 1, further comprising a shuffled nucleic acidmaster array, which master array physically or logically corresponds topositions of the shuffled nucleic acids in the reaction mixture array.59. The device or integrated system of claim 58, further comprising anucleic acid amplification module, which module amplifies members of theshuffled nucleic acid master array, or a duplicate thereof.
 60. Thedevice or integrated system of claim 59, the amplification modulecomprising a heating or cooling element.
 61. The device or integratedsystem of claim 59, the amplification module comprising a DNAmicro-amplifier.
 62. The device or integrated system of claim 59, theamplification module comprising a DNA micro-amplifier, themicro-amplifier comprising one or more of: a programmable resistor, amicromachined zone heating chemical amplifier, a Peltier solid stateheat pump, a heat pump, a heat exchanger, a hot air blower, a resistiveheater, a refrigeration unit, a heat sink, or a Joule Thompson coolingdevice.
 63. The device or integrated system of claim 59, furthercomprising a duplicate amplified array, which duplicate amplified arraycomprises amplicons of the nucleic acid master array, or a duplicatethereof.
 64. The device or integrated system of claim 58, wherein,during operation of the device, the array of reaction mixtures producesan array of reaction mixture products, the device or integrated systemfurther comprising one or more product identification or purificationmodules, which product identification modules identify one or moremembers of the array of reaction products.
 65. The device or integratedsystem of claim 64, wherein the product identification or purificationmodules comprise one or more of: a gel, a polymeric solution, aliposome, a microemulsion, a microdroplet, an affinity matrix, a plasmonresonance detector, a BIACORE, a GC detector, an ultraviolet or visiblelight sensor, an epifluorescence detector, a fluorescence detector, afluorescent array, a CCD, a digital imager, a scanner, a confocalimaging device, an optical sensor, a FACS detector, a micro-FACS unit, atemperature sensor, a mass spectrometer, a stereo-specific productdetector, an Elisa reagent, an enzyme, an enzyme substrate an antibody,an antigen, a refractive index detector, a polarimeter, a pH detector, apH-stat device, an ion selective sensor, a calorimeter, a film, aradiation sensor, a Geiger counter, a scintillation counter, a particlecounter, an H₂O₂ detection system, an electrochemical sensor, ion/gasselective electrodes, and capillary electrophoresis.
 66. The device orintegrated system of claim 64, wherein the one or more reaction productarray members are moved into proximity to the product identificationmodule, or wherein the product identification module performs an xyztranslation, thereby moving the product identification module proximalto the array of reaction products.
 67. The device or integrated systemof claim 66, wherein the one or more reaction product array members areflowed into proximity to the product identification module, wherein anin-line purification system purifies the one or more reaction productarray members from associated materials.
 68. The device or integratedsystem of claim 64, wherein the reaction products comprise one or morepolypeptide, one or more nucleic acid, one or more catalytic RNA, or oneor more biologically active RNA.
 69. The device or integrated system ofclaim 68, wherein the one or more catalytic RNA is a ribozyme, orwherein the biologically active RNA is an anti-sense RNA.
 70. The deviceor integrated system of claim 68, wherein the device further comprises asource of one or more lipid, which one or more lipid is flowed intocontact with the one or more polypeptide, or wherein the lipid is flowedinto contact with the physical or logical array of reaction mixtures, orwherein the lipid is flowed into contact with the one or moretranscribed shuffled or mutagenized nucleic acids, thereby producing oneor more liposomes or micelles comprising the polypeptide, reactionmixture components, or one or more transcribed shuffled or mutagenizednucleic acids.
 71. The device or integrated system of claim 64, whereinthe reaction products comprise one or more polypeptide and wherein thedevice further comprises one or more protein refolding reagent, whichrefolding reagent is flowed into contact with the one or morepolypeptide.
 72. The device or integrated system of claim 71, whereinthe refolding reagent comprises one or more of: guanidine, guanidinium,urea, a detergent, a chelating agent, DTT, DTE, or a chaperonin.
 73. Thedevice or integrated system of claim 64, the product identification orpurification modules comprising one or more of: a protein detector, orprotein purification means.
 74. The device or integrated system of claim64, the product identification or purification modules comprising aninstruction set for discriminating between members of the array ofreaction products based upon one or more of: a physical characteristicof the members, an activity of the members, or concentrations of themembers.
 75. The device or integrated system of claim 64, furthercomprising a secondary product array produced by re-arraying members ofthe reaction product array such that the secondary product array has aselected concentration of product members in the secondary productarray.
 76. The device or integrated system of claim 75, wherein theselected concentration is approximately the same for a plurality ofproduct members in the secondary product array.
 77. The device orintegrated system of claim 64, further comprising an instruction set fordetermining a correction factor which accounts for variation inpolypeptide concentration at different positions in the amplifiedphysical or logical array of polypeptides.
 78. The device or integratedsystem of claim 64 or 75, further comprising a substrate addition modulewhich substrate addition module adds one or more substrate to aplurality of members of the product array or the secondary productarray.
 79. The device of claim 78, further comprising a substrateconversion detector which monitors formation of a product produced bycontact between the one or more substrate and one or more of theplurality of members of the product array or the secondary productarray.
 80. The device of claim 79, wherein formation of the product ordisappearance of substrate is monitored indirectly.
 81. The device ofclaim 79, wherein formation of the product or disappearance of substrateis monitored by monitoring loss of the substrate over time.
 82. Thedevice of claim 79, wherein formation of the product or disappearance ofsubstrate is monitored enantioselectively, regioselectively or stereoselectively.
 83. The device of claim 82, wherein formation of theproduct or disappearance of substrate is monitored by adding at leastone isomer, enantiomer or stereoismer in substantially pure form, whichsubstantially pure form is independent of other potential isomers. 84.The device of claim 79, wherein formation of the product is monitored bydetecting formation of peroxide, protons, or halides, or reduced oroxidized cofactors.
 85. The device of claim 79, wherein formation of theproduct is monitored by detecting changes in heat or entropy whichresult from contact between the substrate and the product, or bydetecting changes in mass, charge, fluorescence, epifluorescence, bychromatography, luminescence or absorbance, of the substrate or theproduct, which result from contact between the substrate and theproduct.
 86. The device or integrated system of claim 64, the device orintegrated system further comprising an array correspondence module,which array correspondence module identifies, determines or records thelocation of an identified product in the array of reaction mixtureproducts which is identified by the one or more product identificationmodules, or which array correspondence module determines or records thelocation of at least a first nucleic acid member of the shuffled nucleicacid master array, or a duplicate thereof, or of an amplified duplicatearray, which member corresponds to the location of one or more member ofthe array of reaction products.
 87. The device or integrated system ofclaim 73, further comprising one or more secondary selection module,which secondary selection module selects at least the first member forfurther recombination, which selection is based upon the location of aproduct identified by the product identification modules.
 88. The deviceor integrated system of claim 64, further comprising a screening orselection module, the module comprising one or more of: an array reader,which reader detects one or more member of the array of reactionproducts; an enzyme which converts one or more member of the array ofreaction products into one or more detectable products; a substratewhich is converted by the one or more member of the array of reactionproducts into one or more detectable products; a cell which produces adetectable signal upon incubation with the one or more member of thearray of reaction products; a reporter gene which is induced by one ormore member of the array of reaction products; a promoter which isinduced by one or more member of the array of reaction products, whichpromoter directs expression of one or more detectable products; or anenzyme or receptor cascade which is induced by the one or more member ofthe array of reaction products.
 89. The device or integrated system ofclaim 87, further comprising a secondary recombination module, whichmodule physically contacts the first member, or an amplicon thereof, toan additional member of the shuffled nucleic acid master array, or theduplicate thereof, or the amplified duplicate array, thereby permittingphysical recombination between the first and additional members.
 90. Thedevice or integrated system of claim 1, further comprising a DNAfragmentation module and a recombination region, which DNA fragmentationmodule comprises one or more of: a nuclease, a mechanical shearingdevice, a polymerase, a random primer, a directed primer, a nucleic acidcleavage reagent, a chemical nucleic acid chain terminator, or anoligonucleotide synthesizer, wherein, during operation of the device,fragmented DNAs produced in the DNA fragmentation module are recombinedin the recombination region to produce the one or more shuffled nucleicacids.
 91. The device or integrated system of claim 1, furthercomprising a module which performs one or more of: error prone PCR, sitesaturation mutagenesis, or site-directed mutagenesis.
 92. The device orintegrated system of claim 1, further comprising a data structureembodied in a computer, an analog computer, a digital computer, or acomputer readable medium, which data structure corresponds to the one ormore shuffled nucleic acids.
 93. The device or integrated system ofclaim 1, wherein the one or more reaction mixtures comprise one or moreshuffled nucleic acids arranged in a microtiter tray at an average ofapproximately 0.1-100 shuffled nucleic acids per well.
 94. The device orintegrated system of claim 1, wherein the one or more reaction mixturescomprise one or more shuffled nucleic acids arranged in a microtitertray at an average of approximately 1-5 shuffled nucleic acids per well.95. The device or integrated system of claim 1, further comprising adiluter, which diluter pre-dilutes the concentration of the one or moreshuffled or mutated nucleic acids prior to addition of the shuffled ormutant nucleic acids to the reaction mixtures.
 96. The device orintegrated system of claim 95, wherein the concentration of the one ormore shuffled nucleic acids is about 0.01 to 100 molecules permicroliter.
 97. The device or integrated system of claim 1, wherein thereaction mixtures are produced by adding the in vitro translationreactant and, optionally, an in vitro transcription reagents, to aduplicate shuffled or mutated nucleic acid array, which duplicateshuffled or mutated nucleic acid array is duplicated from a master arrayof the shuffled or mutated nucleic acids produced by spatially orlogically separating members of a population of the shuffled or mutatednucleic acids to produce a physical or logical array of the shuffled ormutated nucleic acids, by one or more arraying technique selected from:(i) lyophilizing members of the population of shuffled nucleic acids ona solid surface, thereby forming a solid phase array; (ii) chemicallycoupling members of the population of shuffled nucleic acids to a solidsurface, thereby forming a solid phase array; (iii) rehydrating membersof the population of shuffled nucleic acids on a solid surface, therebyforming a liquid phase array; (iv) cleaving chemically coupled membersof the population of shuffled nucleic acids from a solid surface,thereby forming a liquid phase array; (v) accessing one or morephysically separated logical array members from one or more sources ofshuffled nucleic acids and flowing the physically separated logicalarray members to one or more destination, the one or more destinationsconstituting a logical array of the shuffled nucleic acids; and, (vi)printing members of a population of shuffled nucleic acids onto a solidmaterial to form a solid phase array.
 98. The device or integratedsystem of claim 1, wherein the one or more shuffled nucleic acids areproduced by synthesizing a set of overlapping oligonucleotides, or bycleaving a plurality of homologous nucleic acids to produce a set ofcleaved homologous nucleic acids, or both, and permitting recombinationto occur between the set of overlapping oligonucleotides, the set ofcleaved homologous nucleic acids, or both the set of overlappingoligonucleotides and the set of cleaved homologous nucleic acids. 99.The device or integrated system of claim 1, wherein greater than about1% of the physical or logical array of reaction mixtures compriseshuffled or mutant nucleic acids having one or more base changesrelative to a parental nucleic acid.
 100. A diversity generation device,comprising (i) a programmed thermocycler; and, (ii) a fragmentationmodule operably coupled to the programmed thermocycler.
 101. Thediversity generation device of claim 100, wherein the programmedthermocycler comprises a thermocycler operably coupled to a computer,which computer comprises one or more instruction set, which one or moreinstruction set does one or more of: calculates an amount of uracil andan amount of thymidine for use in the programmed thermocycler;calculates one or more crossover region between two or more parentalnucleotides; calculates an annealing temperature; calculates anextension temperature; or selects one or more parental nucleic acidsequence.
 102. The diversity generation device of claim 101, wherein theone or more instruction set receives user input data and sets up one ormore cycle to be performed by the programmed thermocycler.
 103. Thediversity generation device of claim 102, wherein the input datacomprises one or more of: one or more parental nucleic acid sequence, adesired crossover frequency, an extension temperature, or an annealingtemperature.
 104. The diversity generation device of claim 101, whereinthe one or more instruction set calculates the amount of uracil and theamount of thymidine based on a desired fragment size.
 105. The diversitygeneration device of claim 103, wherein the one or more instruction setdirects the one or more cycle on the diversity generation device, whichone or more cycle: (a) amplifies the one or more parental nucleic acidsequence; (b) fragments the one or more parental nucleic acid sequenceto produce one or more nucleic acid fragment; (c) reassembles the one ormore nucleic acid fragment to produce one or more shuffled nucleic acid;and, (d) amplifies the one or more shuffled nucleic acid.
 106. Thediversity generation device of claim 105, wherein step (a) comprisesamplifying the one or more parental nucleic acid sequence in thepresence of uracil.
 107. The diversity generation device of claim 105,wherein the one or more cycle pauses between step (a) and step (b) toallow addition of one or more fragmentation reagent.
 108. The diversitygeneration device of claim 101, wherein the one or more instruction setperforms one or more calculation based on one or more theoreticalprediction of a nucleic acid melting temperature or on one or more setof empirical data, which empirical data comprises a comparison of one ormore nucleic acid melting temperature.
 109. The diversity generationdevice of claim 105, wherein the one or more instruction set instructsthe fragmentation module to fragment the parental nucleic acids toproduce one or more nucleic acid fragments having a desired meanfragment size.
 110. The diversity generation device of claim 100,wherein the programmed thermocycler comprises a thermocycler andsoftware for performing one or more shuffling calculations, whichsoftware is embodied on a web page or is installed directly in thethermocycler.
 111. The diversity generation device of claim 100, whereinthe fragmentation module fragments one or more parental nucleic acids bysonication, DNase II digestion, random primer extension, or uracilincorporation and treatment with one or more uracil cleavage enzyme.112. A diversity generation device comprising: (i) a computer, whichcomputer comprises at least a first instruction set for creating one ormore nucleic acid fragment sequence from one or more parental nucleicacid sequence; (ii) a synthesizer module, which synthesizer modulesynthesizes the one or more nucleic acid fragment sequence; and, (iii) athermocycler, which thermocycler generates one or more diverse sequencefrom the one or more nucleic acid fragment sequence.
 113. The diversitygeneration device of claim 112, wherein the first instruction set limitsor expands diversity of the one or more nucleic acid fragment sequenceby adding or removing one or more amino acid having similar diversity;selecting a frequently used amino acid at one or more specific position;using one or more sequence activity calculation; using a calculatedoverlap with one or more additional oligonucleotide; based on an amountof degeneracy, or based on a melting temperature.
 114. The diversitygeneration device of claim 112, wherein the synthesizer module comprisesa microarray oligonucleotide synthesizer.
 115. The diversity generationdevice of claim 114, wherein the synthesizer module comprises an ink-jetprinter head based oligonucleotide synthesizer.
 116. The diversitygeneration device of claim 112, wherein the synthesizer modulesynthesizes the one or more nucleic acid fragment sequences on a solidsupport.
 117. The diversity generation device of claim 112, wherein thesynthesizer module uses one or more mononucleotide coupling reactions orone or more trinucleotide coupling reactions to synthesize the one ormore nucleic acid fragment sequence.
 118. The diversity generationdevice of claim 112, wherein the thermocycler performs anassembly/rescue PCR reaction.
 119. The diversity generation device ofclaim 118, wherein the computer comprises at least a second instructionset, which second instruction set determines at least a first set ofconditions for the assembly/rescue PCR reaction.
 120. The diversitygeneration device of claim 112, the device further comprising ascreening module for screening the one or more diverse sequence for adesired characteristic.
 121. The diversity generation device of claim120, wherein the screening module comprises a high-throughput screeningmodule.
 122. A diversity generation kit comprising: (i) the diversitygeneration device of claim 100 or claim 112; and, (ii) one or morereagent for diversity generation.
 123. The diversity generation kit ofclaim 122, wherein the reagents comprise E coli., a PCR reaction mixturecomprising a mixture of uracil and thymidine, one or more uracilcleaving enzyme, and a PCR reaction mixture comprising standard dNTPs.124. The diversity generation kit of claim 123, wherein the one or moreuracil cleaving enzyme comprises a uracil glycosidase and anendonuclease.
 125. The diversity generation kit of claim 123, whereinthe mixture of uracil and thymidine comprises a desired ratio of uracilto thymidine, which desired ratio is calculated by the diversitygeneration device.
 126. The diversity generation kit of claim 122,wherein the one or more reagents for diversity generation comprise atleast a first artificially evolved enzyme. The diversity generation kitof claim 126, wherein the at least first artificially evolved enzymecomprises an artificially evolved polymerase.
 127. The diversitygeneration kit of claim 122, further comprising one or more of:packaging materials, a container adapted to receive the device orreagent, or instructional materials for use of the device.
 128. A methodof processing shuffled or mutagenized nucleic acids, the methodcomprising: (a) providing a physical or logical array of reactionmixtures, a plurality of the reaction mixtures comprising one or moremember of a first population of nucleic acids, the first population ofnucleic acids comprising one or more shuffled nucleic acids, or one ormore transcribed shuffled nucleic acids, or one or more mutagenizednucleic acid or one or more transcribed mutagenized nucleic acidswherein a plurality of the plurality of reaction mixtures furthercomprise an in vitro translation reactant; and, (b) detecting one ormore in vitro translation products produced by a plurality of members ofthe physical or logical array of reaction mixtures.
 129. The physical orlogical array or reaction mixtures produced by the method of claim 128.130. The method of claim 128, wherein the array of reaction mixturescomprises a solid phase or a liquid phase array of one or more of: theone or more shuffled or mutagenized nucleic acids, the one or moretranscribed shuffled nucleic acids, or the one or more in vitrotranslation reagents.
 131. The method of claim 128, wherein the one ormore shuffled nucleic acids or the one or more mutagenized nucleic acidsare homologous.
 132. The method of claim 128, wherein the one or moretranscribed shuffled nucleic acid or the one or more transcribedmutagenized nucleic acid is an mRNA, a catalytic RNA or a biologicallyactive RNA.
 133. The method of claim 128, wherein the one or more invitro translation reagents comprise one or more of: a reticulocytelysate, a rabbit reticulocyte lysate, a wheat germ in vitro translationmixture, or an E coli lysate.
 134. The method of claim 128, furthercomprising providing one or more in vitro transcription reagents to theplurality of members of the physical or logical array of reactionmixtures.
 135. The method of claim 134, wherein the in vitrotranscription reagents comprises one or more of: a HeLa nuclear extractin vitro transcription component, an SP6 polymerase, a T3 polymerase ora T7 RNA polymerase.
 136. The method of claim 128, wherein the one ormore shuffled nucleic acids are produced in an automatic DNA shufflingmodule, the method comprising inputting DNAs or character stringscorresponding to input DNAs into the DNA shuffling module and acceptingoutput DNAs from the DNA shuffling module, which output DNAs comprisethe one or more shuffled nucleic acids in the reaction mixture array.137. The method of claim 136, comprising fragmenting the input DNA inthe DNA shuffling module to produce DNA fragments, or providing theinput DNAs to comprise cleaved or synthetic DNA fragments.
 138. Themethod of claim 136, or 137, comprising purifying DNA fragments of aselected length in the DNA shuffling module.
 139. The method of claim138, comprising hybridizing the resulting purified DNA fragments andelongating the resulting hybridized DNA fragments with a polymerase.140. The method of claim 139, further comprising separating,identifying, cloning or purifying the resulting elongated DNAs.
 141. Themethod of claim 139, further comprising determining a recombinationfrequency or a length, or both a recombination frequency and a lengthfor the resulting elongated DNAs.
 142. The method of claim 139, furthercomprising determining a length of the resulting elongated DNAs bydetecting incorporation of one or more labeled nucleic acid ornucleotide into the elongated DNAs.
 143. The method of claim 142,wherein the label is a dye, radioactive label, or a fluorophore. 144.The method of claim 139, comprising determining the length of theresulting elongated DNAs with a fluorogenic 5′ nuclease assay.
 145. Themethod of claim 139, comprising flowing a shuffling reagent or productthrough a microscale channel in the DNA shuffling module.
 146. Themethod of claim 139, wherein the DNA fragments are contacted in a singlepool.
 147. The method of claim 139, wherein the DNA fragments arecontacted in multiple pools.
 148. The method of claim 139, furthercomprising dispensing the resulting elongated DNAs into one or moremultiwell plates.
 149. The method of claim 139, further comprisingdispensing the resulting elongated DNAs into one or more multiwellplates at a selected density per well of the elongated DNAs.
 150. Themethod of claim 139, further comprising dispensing the resultingelongated DNAs into one or more master multiwell plates and PCRamplifying the resulting master array of elongated nucleic acids toproduce an amplified array of elongated nucleic acids, the shufflingmodule comprising a array copy system which transfers aliquots from thewells of the one or more master multiwell plates to one or more copymultiwell plates.
 151. The method of claim 150, comprising determiningan extent of PCR amplification by one or more technique selected from:incorporation of a label into one or more amplified elongated nucleicacid, and applying a fluorogenic 5′ nuclease assay.
 152. The method ofclaim 150, wherein the array of reaction mixtures is formed by separateor simultaneous addition of an in vitro transcription reagents and an invitro translation reactant to the one or more copy multiwell plates, orto a duplicate set thereof, wherein the elongated DNAs comprise the oneor more shuffled nucleic acids.
 153. The method of claim 128, whereinthe array of reaction mixtures produces an array of reaction mixtureproducts.
 154. The method of claim 153, wherein the reaction productscomprise one or more polypeptide.
 155. The method of claim 153, whereinthe reaction products comprise one or more polypeptide, the methodfurther comprising re-folding the one or more polypeptide by contactingthe one or more polypeptide with a refolding reagent.
 156. The method ofclaim 155, wherein the refolding reagent comprises one or more of:guanidine, urea, DTT, DTE, or a chaperonin.
 157. The method of claim153, comprising moving the one or more reaction product array membersinto proximity to a product identification module, or moving a productidentification module into proximity to the reaction product arraymembers.
 158. The method of claim 153, wherein the one or more reactionproduct array members are flowed into proximity to a productidentification module, the method further comprising in-linepurification of the one or more reaction product array members.
 159. Themethod of claim 153, further comprising contacting the one or morepolypeptide with one or more lipid to produce one or more liposome ormicelle, which liposome or micelle comprises the one or morepolypeptide.
 160. The method of claim 153, further comprising one ormore of: reading the array of reaction mixture products with an arrayreader, which reader detects one or more member of the array of reactionproducts; converting one or more member of the array of reactionproducts with an enzyme into one or more detectable products; convertingone or more substrates by the one or more member of the array ofreaction products into one or more detectable products; contacting acell to one or more member of the array of reaction products, which cellor reaction product, or both, produce a detectable signal uponcontacting the one or more member of the array of reaction products;inducing a reporter gene with one or more member of the array ofreaction products; inducing a promoter with one or more member of thearray of reaction products, which promoter directs expression of one ormore detectable products; or inducing an enzyme or receptor cascade withone or more member of the array of reaction products, which cascade isinduced by the one or more member of the array of reaction products.161. A method of recombining members of a physical or logical array ofnucleic acids, the method comprising: (a) providing at least a firstpopulation of nucleic acids, or (b) providing a data structurecomprising character strings corresponding to the first population ofnucleic acids; (c) recombining one or more members of the firstpopulation of nucleic acids, thereby providing a first population ofrecombinant nucleic acids, or (d) recombining one or more of thecharacter strings corresponding to one or more members of the firstpopulation of nucleic acids, thereby providing a population of characterstrings corresponding to the first population of recombinant nucleicacids, and converting the population of character strings correspondingto the first population of recombinant nucleic acids into the firstpopulation of recombinant nucleic acids, thereby providing the firstpopulation of recombinant nucleic acids; (e) spatially or logicallyseparating members of the population of recombinant nucleic acids toproduce a physical or logical array of recombinant nucleic acids andamplifying the recombinant nucleic acids in the physical or logicalarray of recombinant nucleic acids in vitro to provide an amplifiedphysical or logical array of recombinant nucleic acids, or, (f) in vitroamplifying members of the population of recombinant nucleic acids andphysically or logically separating the population of recombinant nucleicacids to produce an amplified physical or logical array of recombinantnucleic acids.
 162. The method of claim 161, further comprising: (g)screening the amplified physical or logical array of recombinant nucleicacids, or a duplicate thereof, for a desired property.
 163. The methodof claim 161, wherein the data structure is embodied in a computer, ananalog computer, a digital computer, or a computer readable medium. 164.The method of claim 161, wherein spatially or logically separatingmembers of the population of recombinant nucleic acids to produce aphysical or logical array of recombinant nucleic acids or amplifiedrecombinant nucleic acids comprises plating the nucleic acids in amicrotiter tray at an average of approximately 0.1-10 array members perwell.
 165. The method of claim 161, wherein spatially or logicallyseparating members of the population of recombinant nucleic acids toproduce a physical or logical array of recombinant nucleic acidscomprises plating the nucleic acids in a microtiter tray at an averageof approximately 1-5 array members per well.
 166. The method of claim161, wherein spatially or logically separating the members of thepopulation of recombinant nucleic acids comprises diluting the membersof the population with a buffer.
 167. The method of claim 161, whereinthe concentration of the population of recombinant nucleic acids isabout 0.01 to 100 molecules per microliter.
 168. The method of claim161, wherein spatially or logically separating members of the populationof recombinant nucleic acids to produce a physical or logical array ofrecombinant nucleic acids comprises one or more of: (i) lyophilizingmembers of the population of recombinant nucleic acids on a solidsurface, thereby forming a solid phase array; (ii) chemically couplingmembers of the population of recombinant nucleic acids to a solidsurface, thereby forming a solid phase array; (iii) rehydrating membersof the population of recombinant nucleic acids on a solid surface,thereby forming a liquid phase array; (iv) cleaving chemically coupledmembers of the population of recombinant nucleic acids from a solidsurface, thereby forming a liquid phase array; or, (v) accessing one ormore physically separated logical array members from one or more sourcesof recombinant nucleic acids and flowing the physically separatedlogical array members to one or more destination.
 169. A method ofrecombining members of a physical or logical array of nucleic acids, themethod comprising: (a) providing at least a first population of nucleicacids arranged in a physical or logical array; (b) recombining one ormore members of the first population of nucleic acids with one or moreadditional nucleic acid, thereby providing a first physical or logicalarray comprising a population of recombinant nucleic acids; (c)amplifying the recombinant nucleic acids in the physical or logicalarray of recombinant nucleic acids in vitro to provide an amplifiedphysical or logical array of recombinant nucleic acids; and, (g)screening the first or amplified physical or logical array ofrecombinant nucleic acids, or a duplicate thereof, for a desiredproperty.
 170. The method of claim 128 or 169, wherein the firstpopulation of nucleic acids or the population of recombinant nucleicacids are arranged in a physical or logical matrix at an average ofapproximately 0.1-10 array members per array position.
 171. The methodof claim 128 or 169, wherein the first population of nucleic acids orthe population of recombinant nucleic acids are arranged in a physicalor logical matrix at an average of approximately 0.5-5 array members perarray position.
 172. The method of claim 128 or 169, wherein the firstpopulation of nucleic acids or the population of recombinant nucleicacids comprise a solid phase or a liquid phase array.
 173. The method ofclaim 128 or 169, wherein the first population of nucleic acids isprovided by one or more of: synthesizing a set of overlappingoligonucleotides, cleaving a plurality of homologous nucleic acids toproduce a set of cleaved homologous nucleic acids, step PCR of one ormore target nucleic acid, uracil incorporation and cleavage duringcopying of one or more target nucleic acids, and incorporation of acleavable nucleic acid analogue into a target nucleic acid and cleavageof the resulting target nucleic acid; or, wherein the set of overlappingoligonucleotides or the set of cleaved homologous nucleic acids areflowed into one or more selected physical locations.
 174. The method ofclaim 128, 161 or 169, wherein the first population of nucleic acids isprovided by synthesizing a set of overlapping oligonucleotides, bycleaving a plurality of homologous nucleic acids to produce a set ofcleaved homologous nucleic acids, or both.
 175. The method of claim 128,161 or 169, wherein the first population of nucleic acids is provided bysonicating, cleaving, partially synthesizing, random primer extending ordirected primer extending one or more of: a synthetic nucleic acid, aDNA, an RNA, a DNA analogue, an RNA analogue, a genomic DNA, a cDNA, anmRNA, a DNA generated by reverse transcription, an nRNA, an aptamer, apolysome associated nucleic acid, a cloned nucleic acid, a cloned DNA, acloned RNA, a plasmid DNA, a phagemid DNA, a viral DNA, a viral RNA, aYAC DNA, a cosmid DNA, a fosmid DNA, a BAC DNA, a P1-mid, a phage DNA, asingle-stranded DNA, a double-stranded DNA, a branched DNA, a catalyticnucleic acid, an antisense nucleic acid, an in vitro amplified nucleicacid, a PCR amplified nucleic acid, an LCR amplified nucleic acid, aQβ-replicase amplified nucleic acid, an oligonucleotide, a nucleic acidfragment, a restriction fragment or a combination thereof.
 176. Themethod of claim 175, wherein the first population of nucleic acids isfurther provided by purifying one or more member of the first populationof nucleic acids.
 177. The method of claim 128, 161 or 169, wherein thefirst population of nucleic acids is provided by transporting one ormore members of the population from one or more sources of one or moremembers of the first population to one or more destinations of the oneor more members of the first population of nucleic acids.
 178. Themethod of claim 177, wherein said transporting comprises flowing the oneor more members from the source to the destination.
 179. The method ofclaim 177, the one or more sources of nucleic acids comprising one ormore of: a solid phase array, a liquid phase array, a container, amicrotiter tray, a microtiter tray well, a microfluidic chip, a testtube, a centrifugal rotor, a microscope slide, or a combination thereof.180. The method of claim 150, 161 or 169, wherein amplifying therecombinant nucleic acids in the physical or logical array ofrecombinant nucleic acids, or amplifying the elongated nucleic acids inthe master array comprises one or more amplification technique selectedfrom: PCR, LCR, SDA, NASBA, TMA and Qβ-replicase amplification.
 181. Themethod of claim 150, 161 or 169, wherein amplifying the recombinantnucleic acids in the physical or logical array or amplifying theelongated nucleic acids in the master array comprises heating or coolingthe physical or logical array or the master array, or a portion thereof.182. The method of claim 150, 161 or 169, wherein amplifying therecombinant nucleic acids in the physical or logical array or amplifyingthe elongated nucleic acids in the master array comprises incorporatingone or more transcription or translation control subsequence into one ormore of: the elongated nucleic acids, the recombinant nucleic acids inthe physical or logical array, an intermediate nucleic acid producedusing the elongated nucleic acids or the recombinant nucleic acids inthe physical or logical array as a template, or a partial or completecopy of the elongated nucleic acids or the recombinant nucleic acids inthe physical or logical array.
 183. The method of claim 182, wherein theone or more transcription or translation control subsequence is ligatedto into one or more of: the elongated nucleic acids, the recombinantnucleic acids in the physical or logical array, an intermediate nucleicacid produced using the elongated nucleic acids or the recombinantnucleic acids in the physical or logical array as a template, or apartial or complete copy of the elongated nucleic acids or therecombinant nucleic acids in the physical or logical array.
 184. Themethod of claim 182, wherein the one or more transcription ortranslation control subsequence is hybridized or partially hybridized toone or more of: the elongated nucleic acids, the recombinant nucleicacids in the physical or logical array, an intermediate nucleic acidproduced using the elongated nucleic acids or the recombinant nucleicacids in the physical or logical array as a template, or a partial orcomplete copy of the elongated nucleic acids or the recombinant nucleicacids in the physical or logical array.
 185. The method of claim 181,wherein the recombinant nucleic acids in the physical or logical arrayor the elongated nucleic acids in the master array are amplified in aDNA micro-amplifier.
 186. The method of claim 185, wherein themicro-amplifier comprises one or more of: a programmable resistor, amicromachined zone heating chemical amplifier, a chemical denaturationdevice, an electrostatic denaturation device, or a microfluidicelectrical fluid resistance heating device.
 187. The method of claim181, wherein the physical or logical array, or portion thereof or themaster array or portion thereof, is heated or cooled by one or more of:a Peltier solid state heat pump, a heat pump, a resistive heater, arefrigeration unit, a heat sink, or a Joule Thompson cooling device.188. The method of claim 161 or 169, further comprising producing aduplicate amplified physical or logical array of recombinant nucleicacids.
 189. The method of claim 162 or 169, wherein screening theamplified physical or logical array of recombinant nucleic acids, or aduplicate thereof, for a desired property comprises: assaying a proteinor product nucleic acid encoded by one or more members of the amplifiedphysical or logical array of recombinant nucleic acids for one or moreproperty.
 190. The method of claim 161 or 169, further comprising invitro transcribing members of the amplified physical or logical array ofrecombinant nucleic acids to produce an amplified array of in vitrotranscribed nucleic acids.
 191. The method of claim 128 or 169,comprising providing a first population of single-stranded templatepolynucleotides, which template polynucleotides are the same ordifferent, and recombining the template polynucleotides by: (i)annealing a plurality of partially overlapping complementary nucleicacid fragments; and, (ii) extending the annealed fragments to produce aphysical or logical array comprising a first population of recombinantnucleic acids.
 192. The method of claim 191, comprising providing aphysical array comprising the first population of templatepolynucleotides immobilized on a solid support.
 193. The method of claim192, wherein the solid support comprises a glass support, a plasticsupport, a silicon support, a chip, a bead, a pin, a filter, a membrane,a microtiter plate, or a slide.
 194. The method of claim 192, whereinthe first population of template polynucleotides comprises substantiallyan entire genome.
 195. The method of claim 194, wherein the firstpopulation of template polynucleotides comprises a bacterial or fungalgenome.
 196. The method of claim 192, wherein the first population oftemplate polynucleotides comprises substantially all of the expressionproducts of a cell, tissue or organism.
 197. The method of claim 196,wherein the first population of template polynucleotides comprises theexpression products of a eukaryotic cell, tissue or organism.
 198. Themethod of claim 192, wherein the first population of templatepolynucleotides comprises a subset of the expression products of a cell,tissue or organism.
 199. The method of claim 198, wherein the firstpopulation of template polynucleotides comprises the expression productsof a eukaryotic cell, tissue or organism.
 200. The method of claim 192,the first population of template polynucleotides comprises a library ofgenomic nucleic acids or cellular expression products.
 201. The methodof claim 200, wherein the library of cellular expression productscomprises a cDNA library.
 202. The method of claim 191, wherein one ormore template polynucleotides comprise one or more of a coding RNA, acoding DNA, an antisense RNA, and antisense DNA, a non-coding RNA, anon-coding DNA, an artificial RNA, an artificial DNA, a synthetic RNA, asynthetic DNA, a substituted RNA, a substituted DNA, a naturallyoccurring RNA, a naturally occurring DNA, a genomic RNA, a genomic DNA,or a cDNA.
 203. The method of claim 161 or 169, further comprising invitro transcribing members of the amplified physical or logical array ofrecombinant nucleic acids to produce an amplified array of transcribednucleic acids and translating the amplified physical or logical array oftranscribed nucleic acids to produce an amplified physical or logicalarray of polypeptides.
 204. The method of claim 203, further comprisingdetermining a concentration of polypeptide or transcribed nucleic acidat one or more positions in the amplified physical or logical array ofpolypeptides.
 205. The method of claim 204, further comprisingre-arraying the amplified physical or logical array of polypeptides orin vitro transcribed nucleic acids in a secondary polypeptide or invitro transcribed nucleic acid array which has an approximately uniformconcentration of polypeptides or in vitro transcribed nucleic acids at aplurality of locations in the secondary polypeptide array.
 206. Themethod of claim 204, further comprising determining a correction factorwhich accounts for variation in polypeptide or in vitro transcribednucleic acid concentrations at different positions in the amplifiedphysical or logical array of polypeptides or in vitro transcribednucleic acids.
 207. The method of claim 203, further comprising addingone or more substrate to a plurality of members of the logical array ofpolypeptides or in vitro transcribed nucleic acids.
 208. The method ofclaim 207, further comprising monitoring formation of a product producedby contact between the one or more substrate and one or more of theplurality of members of the logical array of polypeptides.
 209. Themethod of claim 208, wherein the formation of the product is detectedindirectly.
 210. The method of claim 208, wherein the formation of theproduct is detected by a coupled enzymatic reaction which detects theproduct or the substrate or a secondary product of the product orsubstrate.
 211. The method of claim 208, wherein the formation of theproduct is detected by monitoring peroxide production.
 212. The methodof claim 208, wherein the formation of the product is detected directly.213. The method of claim 208, wherein the formation of the product isdetected by monitoring production or heat or entropy which results fromthe formation of the product.
 214. The method of claim 203, furthercomprising selecting the physical or logical array of polypeptides for adesired property, thereby identifying one or more selected member of thephysical or logical array of polypeptides which has a desired property,thereby identifying one or more selected member of the amplifiedphysical or logical array of recombinant nucleic acids that encodes theone or more member of the physical or logical array of polypeptides.215. The method of claim 214, wherein selecting the physical or logicalarray is performed in a primary screening assay, the method furthercomprising one or more of: (i) re-selecting the one or more selectedmember of the amplified physical or logical array of recombinant nucleicacids in a secondary screening assay; (ii) quantifying protein levels atone or more location in the physical or logical array of polypeptides;(iii) purifying proteins from one or more locations in the physical orlogical array of polypeptides; (iv) normalizing activity levels in theprimary screen by compensating for protein quantitation at a pluralityof locations in the physical or logical array of polypeptides; (v)determining a physical characteristic of the one or more selectedmembers; or, (vi) determining an activity of the one or more selectedmembers.
 216. The method of claim 214, further comprising recombiningthe one or more selected member of the amplified physical or logicalarray of recombinant nucleic acids with one or more additional nucleicacids, in vivo, in vitro or in silico.
 217. The method of claim 214,further comprising cloning or sequencing the one or more member of theamplified physical or logical array of recombinant nucleic acids. 218.The method of claim 161 or 169, further comprising selecting one or moremember of the amplified physical or logical array, or a duplicatethereof, based upon the screening of the amplified physical or logicalarray for a desired property.
 219. The method of claim 218, wherein aplurality of members of the amplified physical or logical array orduplicate thereof are selected, recombined and re-arrayed to form asecondary array of recombined selected nucleic acids, which secondaryarray is re-screened for the desired property, or for a second desiredproperty.
 220. A method of detecting or enriching for in vitrotranscription or translation products, the method comprising: localizingone or more first nucleic acids which encode one or more moietiesproximal to one or more moiety recognition agents which specificallybind the one or more moieties; in vitro translating or transcribing theone or more nucleic acids, thereby producing the one or more moieties,which one or more moieties diffuse or flow into contact with the one ormore moiety recognition agents; and, permitting binding of the one ormore moieties to the one or more moiety recognition agents, anddetecting or enriching for the one or more moieties by detecting orcollecting one or more material proximal to, within or contiguous withthe moiety recognition agent which material comprises at least one ofthe one or more moieties, which moieties individually comprise one ormore in vitro translation or transcription product.
 221. The method ofclaim 220, further comprising pooling the one or more moieties bypooling the material which is collected.
 222. The method of claim 220,wherein the one or more moieties comprise one or more polypeptides orone or more RNAs.
 223. The method of claim 220, wherein one or moremoiety recognition agents comprise one or more antibody or one or moresecond nucleic acids.
 224. The method of claim 220, wherein the firstnucleic acids comprise a related population of shuffled nucleic acids.225. The method of claim 220, wherein the first nucleic acids comprise arelated population of shuffled nucleic acids, which shuffled nucleicacids encode an epitope tag, which epitope tag is bound by the moiety orthe one or more moiety recognition agents.
 226. The method of claim 220,wherein the first nucleic acids comprise a related population ofshuffled nucleic acids and a PCR primer binding region, the methodfurther comprising PCR amplifying a set of parental nucleic acids toproduce the related population of shuffled nucleic acids.
 227. Themethod of claim 220, wherein the first nucleic acids comprise a relatedpopulation of shuffled nucleic acids and a PCR primer binding region,the method further comprising identifying one or more target firstnucleic acid by proximity to the moieties which are bound to the one ormore moiety recognition agent, and amplifying the target first nucleicacid by hybridizing a PCR primer to the PCR primer binding region andextending the primer with a polymerase.
 228. The method of claim 220,wherein the first nucleic acids comprise an inducible or constitutiveheterologous promoter.
 229. The method of claim 220, wherein the firstnucleic acids and the one or more moiety recognition agents arelocalized on a solid substrate.
 230. The solid substrate made by themethod of claim
 229. 231. The method of claim 229, wherein the solidsubstrate is a bead.
 232. The method of claim 229, wherein the firstnucleic acids and the one or more moiety recognition agents arelocalized on the solid substrate by one or more of: a cleavable linkerchemical linker, a gel, a colloid, a magnetic field, or an electricalfield.
 233. The method of claim 220, further comprising detecting anactivity of the moiety or moiety recognition agent.
 234. The method ofclaim 233, further comprising picking the one or more first nucleic acidwith an automated robot.
 235. The method of claim 233, furthercomprising picking the one or more first nucleic acid by placing acapillary on a region comprising the detected activity of the moiety ormoiety recognition agent.
 236. The method of claim 220, wherein themoiety or moiety in contact with the moiety recognition agent cleaves acleavable linker, which linker attaches the first nucleic acid to asolid substrate.
 237. A method of producing duplicate arrays of shuffledor mutagenized nucleic acids, the method comprising: providing aphysical or logical array of shuffled or mutagenized nucleic acids ortranscribed shuffled or transcribed mutagenized nucleic acids; and,forming a duplicate array of copies of the shuffled or mutagenizednucleic acids or copies of the transcribed shuffled or transcribedmutagenized nucleic acids by physically or logically organizing thecopies into a physical or logical array.
 238. The physical or logicalarray and duplicate array produced by the method of claim
 237. 239. Themethod of claim 237, wherein the copies are produced by copying theshuffled or mutagenized nucleic acids or transcribed shuffled ortranscribed mutagenized nucleic acids using a polymerase or an in vitronucleic acid synthesizer.
 240. The method of claim 237, furthercomprising forming an array of reaction mixtures which corresponds tothe physical or logical array of shuffled or mutagenized nucleic acidsor transcribed shuffled or transcribed mutagenized nucleic acids, whichreaction mixtures comprise members of the array of shuffled ormutagenized nucleic acids or transcribed shuffled or transcribedmutagenized nucleic acids or the duplicate array of copies of theshuffled or mutagenized nucleic acids or copies of the transcribedshuffled or transcribed mutagenized nucleic acids, or a derivative copythereof.
 241. The method of claim 240, wherein the reaction mixturesfurther comprise one or more in vitro transcription or translationreagent.
 242. A method of normalizing an array of reaction mixtures, themethod comprising: in vitro transcribing or translating a physical orlogical array of shuffled or mutagenized nucleic acids or transcribedshuffled or transcribed mutagenized nucleic acids to produce an array ofproducts; and, determining a correction factor which accounts forvariation in concentration of the products at different sites in thearray of products.
 243. The method of claim 242, further comprisingproducing a secondary product array, which secondary array comprisesselected concentrations of the products at one or more sites in thesecondary array.
 244. The physical or logical array of shuffled ormutagenized nucleic acids or transcribed shuffled or transcribedmutagenized nucleic acids, the array of products and the secondary arrayproduced by the method of claim
 243. 245. The method of claim 243,wherein the secondary array is formed by transferring an aliquot from aplurality of sites in the array of products to a plurality of secondarysites in the secondary array.
 246. The method of claim 245, furthercomprising diluting the products during said transferring or aftertransfer to the secondary sites, thereby selecting the concentration ofthe products at the secondary sites in the secondary array.
 247. Amethod for recombining one or more nucleic acids, the method comprising:(a) immobilizing one or more template nucleic acids on a solid support;(b) annealing a plurality of partially overlapping complementary nucleicacid fragments to the immobilized template nucleic acid; (c) extendingor ligating the annealed fragments to produce at least one heteroduplex,which heteroduplex comprises a template nucleic acid and a substantiallyfull-length heterolog complementary to the template nucleic acid; and,(d) recovering at least one substantially full-length heterolog. 248.The method of claim 247, comprising immobilizing a plurality of templatenucleic acids on a solid support.
 249. The method of claim 248, whereinthe plurality of template nucleic acids comprises substantially anentire genome.
 250. The method of claim 249, wherein the plurality oftemplate nucleic acids comprises a bacterial or fungal genome.
 251. Themethod of claim 248, wherein the plurality of template nucleic acidscomprises substantially all of the expression products of a cell, tissueor organism.
 252. The method of claim 251, wherein the plurality oftemplate nucleic acids comprises the expression products of a eukaryoticcell, tissue or organism.
 253. The method of claim 248, wherein theplurality of template nucleic acids comprises a subset of the expressionproducts of a cell, tissue or organism.
 254. The method of claim 253,wherein the plurality of template nucleic acids comprises the expressionproducts of a eukaryotic cell, tissue or organism.
 255. The method ofclaim 248, wherein the plurality of template nucleic acids comprises alibrary of genomic nucleic acids or cellular expression products. 256.The method of claim 255, wherein the library of cellular expressionproducts comprises a cDNA library.
 257. The method of claim 248,comprising immobilizing the plurality of template nucleic acids in aspatial array.
 258. The method of claim 247, wherein the one or moretemplate nucleic acids comprise one or more of: a DNA, an RNA, a codingRNA, a coding DNA, an antisense RNA, an antisense DNA, a non-coding RNA,a non-coding DNA, an artificial RNA, an artificial DNA, a synthetic RNA,a synthetic DNA, a substituted RNA, a substituted DNA, a naturallyoccurring RNA, a naturally occurring DNA, a genomic RNA, a genomic DNA,or a cDNA.
 259. The method of claim 247, comprising immobilizing one ormore template nucleic acids on a solid support selected from among aglass support, a plastic support, a silicon support, a chip, a bead, apin, a filter, a membrane, a microtiter plate, and a slide.
 260. Themethod of claim 247, comprising immobilizing the one or more templatenucleic acids by depositing a solution comprising the one or moretemplate nucleic acids on a glass slide, which glass slide is coatedwith a polycationic polymer.
 261. The method of claim 260, wherein thepolycationic polymer comprises polylysine or polyarginine.
 262. Themethod of claim 259, comprising immobilizing the one or more templatenucleic acids by tethering the one or more template nucleic acids to thesolid support.
 263. The method of claim 262, wherein tethering compriseschemical tethering, biotin-mediated binding, uv cross-linking,fluorescence activated cross-linking, or heat mediated cross-linking.264. The method of claim 247, comprising enzymatically extending theannealed fragments with a DNA or RNA polymerase.
 265. The method ofclaim 264, comprising enzymatically extending the annealed fragmentswith a thermostable polymerase.
 266. The method of claim 247, comprisingenzymatically extending the annealed fragments with a ligase ornuclease, which ligase or nuclease comprises polymerase activity. 267.The method of claim 247, comprising extending and ligating the annealedfragments to produce at least one substantially full-length heterolog. Asubstantially full-length heterolog produced by the method of claim 247.268. An array comprising a plurality of heteroduplexes or full-lengthheterologs produced by the method of claim
 247. 269. The method of claim247, comprising recovering the at least one substantially full-lengthheterolog by (i) denaturing the heteroduplex; (ii) annealing at leastone oligonucleotide primer to the heterolog; and, (iii) extending theoligonucleotide primer to produce a duplex polynucleotide.
 270. Themethod of claim 269, further comprising amplifying the duplexpolynucleotide.
 271. The method of claim 270, comprising amplifying theduplex polynucleotide using a boomerang sequence, a splinkerette or avectorette.
 272. An amplified heterolog produced by the method of claim270.
 273. The method of claim 269, further comprising introducing theduplex polynucleotide into a cell.
 274. The method of claim 273,comprising introducing the duplex polynucleotide into a cell via avector.
 275. The method of claim 274, wherein the vector is a plasmid, acosmid, a phage or a transposon.
 276. A vector produced by the method ofclaim
 274. 277. A cell produced by the method of claim
 273. 278. Themethod of claim 247, further comprising identifying at least onesubstantially full-length heterolog with a desired property.
 279. Themethod of claim 278, comprising identifying the at least onesubstantially full-length heterolog with a desired property in anautomated or partially automated high-throughput assay system.
 280. Themethod of claim 247, further comprising: (i) recombining or mutating theat least one substantially full-length heterolog to produce a library ofdiversified heterologs; and (ii) optionally, identifying at least onediversified heterolog with a desired property.
 281. A library ofdiversified heterologs produced by the method of claim
 280. 282. Anintegrated system comprising an array, which array comprises a pluralityof heteroduplexes or full-length heterologs produced by the method ofclaim
 247. 283. The integrated system of claim 282, further comprisingone or more of a detector, a data input device, a data output device, adata storage device, and a controller.
 284. The integrated system ofclaim 283, wherein the controller comprises one or more of a fluidhandling mechanism, an array mobilization mechanism, and an arraystorage device.
 285. A method of directing nucleic acid fragmentationusing a computer, the method comprising: calculating a ratio of uracilto thymidine, which ratio when used in a fragmentation module producesone or more nucleic acid fragment of a selected length.
 286. A method ofdirecting PCR using a computer, the method comprising: calculating oneor more crossover region between two or more parental nucleic acidsequence using one or more annealing temperature or extensiontemperature.
 287. The method of claim 286, comprising calculating theone or more crossover region using one or more theoretical prediction orone or more set of empirical data to calculate a melting temperature.288. A method of selecting one or more parental nucleic acids fordiversity generation using a computer, the method comprising: (i)performing an alignment between two or more potential parental nucleicacid sequences; (ii) calculating a number of mismatches between thealignment; (iii) calculating a melting temperature for one or morewindow of w bases in the alignment; (iv) identifying one or more windowof w bases having a melting temperature greater than x; (v) identifyingone or more crossover segment in the alignment, which one or morecrossover segment comprises two or more windows having a meltingtemperature greater than x, which two or more windows are separated byno more than n nucleotides; (vi) calculating a dispersion of the one ormore crossover segments; (vii) calculating a first score for eachalignment based on the number of windows having a melting temperaturegrater that x, the dispersion, and the number of crossover segmentsidentified; (viii) calculating a second score based on the number ofmismatches, the number of windows having a melting temperature graterthat x, the dispersion, and the number of crossover segments identified;and, (ix) selecting one or more parental nucleic acid based on the firstscore and/or the second score.
 289. The method of claim 288, furthercomprising repeating steps (i) through (viii) starting with the one ormore parental nucleic acid selected in step (ix).
 290. The method ofclaim 288, further comprising repeating steps (i) through (viii)starting with the one or more potential parental nucleic acid sequencesand one or more different input parameters for calculating the meltingtemperature in step (ii).
 291. The method of claim 288, wherein thealignment comprises a pairwise alignment.
 292. The method of claim 288,wherein w comprises an odd number.
 293. The method of claim 288, whereinw comprises about
 21. 294. The method of claim 288, further comprisingcalculating the melting temperature for the one or more window of wbases in the alignment from one or more set of empirical data or one ormore melting temperature prediction algorithm.
 295. The method of claim288, wherein x is about 65° C.
 296. The method of claim 288, wherein nis about
 2. 297. The method of claim 288, wherein the dispersioncomprises the inverse of the average number of bases between crossoversegments in the alignment.
 298. The method of claim 288, wherein theinstruction set selects the two or more potential parental nucleic acidsequences by searching one or more database for one or more nucleic acidsequence of interest and one or more homolog of the one or more nucleicacid sequence of interest.
 299. A web page for directing nucleic aciddiversity generation, the web page comprising a computer readable mediumthat causes a computer to perform the method of claim 285, claim 286, orclaim 288.